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Abstract. In this article we propose a general framework for normal 
approximation using Stein's method. We introduce the new concept of 
Stein couplings and we show that it lies at the heart of popular ap- 
proaches such as the local approach, exchangeable pairs, size biasing 
and many other approaches. We prove several theorems with which 
normal approximation for the Wasserstein and Kolmogorov metrics be- 
comes routine once a Stein coupling is found. To illustrate the versa- 
tility of our framework we give applications in Hoeffding's combinato- 
rial central limit theorem, functionals in the classic occupancy scheme, 
neighbourhood statistics of point patterns with fixed number of points 
and functionals of the components of randomly chosen vertices of sub- 
critical Erdos-Renyi random graphs. In all these cases, we use new, 
non-standard couplings. 
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1. Introduction 



Since its introduction in the earl y 70s, Stein' s method has gone through 
a vivid development. First used by Steinl ( 1972 ) for normal approximation 
of m-dependent sequences, it was gradually modified and generalized to 
other distributions and different dependency settings. Among the few im- 
portant contributions t hat in fl uence d the theoretical understanding of the 
method is the book by Steinl ( 19861 ). where the concepts of auxiliary ran- 
domization and exchangeable pairs are introduced. Another c o rner stone 
is the genera tor method, independently introduced by Barbour (1988) and 
Gotzd (|l99lh . which allows for an adaptation of the method to more com- 
plicated approximating distributions, such as compou nd Poisson distribu- 
tion, P oisson point processes and Gaussian diffusions. Diaconis and Zabelll 
( 199ll ) give a connection between St e in's m ethod and orthogonal polynomi- 
als; see also Goldstein and Reinert ( 2005a! ) . who related this approach to 
distributional transfo r matio ns. A more recent development was initiated by 
Nourdin and Peccatil (|200fll ) and related articles, where a fruitful theory for 
normal approximation for functionals of Gaussian measures, Rademacher 
sequences and Poisson measures is developed. Despite these achievements, 
relatively little effort has been put into building up a rigorous theoretical 
framework for the method in order to unify and generalize the variety of 
known results. 

In particular for normal approximation, a wide range of different ap- 
proaches has appeared over the last decades. A mong the mo st prominent ap- 
proaches a re the local appr o ach, d ating back to Stein ( 19721 ) and exten sively 
studie d by Chen and Shad ( 2004), the exchangeable pa ir s approach bvlStein 
(Il986h furthe r developed bv iRinott and Rotarl (|1997I k iRollinl (l2007bl ) and 
Rollinl (|2008ah with a variety of applications such as weighted [/-statistics, 
anti-voter model on finite graphs and models in statist i cal m echa nics, and 



the siz e and zero bias couplings bv lGoldstein and Rinottl (119961) an d Golds tein and Reinert 
( 19971 ) . respectively. Another approach was introduced by Chatter iee ( 20081 ) 
for functionals of independent random variables (we will discuss a more 
general version of this approach — called interpolation to independence — 
in Section 13. 4p . In addition to these abstract approaches, many other 
ad-hoc construction s ha ve been used to t ackle specific problems, such as 
Ho and Chen ( 1978) and iBolthausenl (Il984h for the c ombina t orial li mit the- 
orem. iBarbour. Karoriski. and Ruciriskil (Il989h and Rollin ( 2008bh fo r re- 
fined versions of local depe n dence and IBarbour and Eaeleson ~(fT986) and 
Zhao. Bai. Chao. and Lianal ( 19971 ) for double indexed permutation statis- 
tics. However, despite all these achievements, a unifying framework is still 
missing and connections between the different approaches given in the lit- 
erature are vague, at best. Although making an at tempt to syste mati- 
cally discuss Stein ' s me thod, survey articles such as iReinertl (1998) and 
Rinott and Rotar / (|200(t ) illustrate the key issue here: for each approach a 
separate theorem is proved, and then typically only for one specific metric. 

The reason for this seems to be the following. For all of these approaches, 
the involved random variables either have to satisfy some more or less ab- 
stract conditions or some specific properties in the dependence structure are 
exploited to obtain the results. This could be a defining equation like in the 
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size-biasing approach, a linearity assumption on a conditional expectation 
for exchangeable pairs, a local dependence structure, or other properties 
and conditions. Depending on the specific form of these conditions, the 
quantities arising from using Stein's method — seemingly — have to be han- 
deled differently. Although in simpler applications it might be clear how to 
directly manipulate these expressions in an ad-hoc way in order to success- 
fully apply Stein' s method, this is less feasible for more complex situations. 
Chatterjeel (2008) proposes an approach for functionals of finite collections 



of independent random variables. His approach comes at the cost of a rather 
complicated bound, and for many applications optimal results (in terms of 
moment conditions and metrics) may not be obtained that way (c.f. the 
difference between Corollaries 12.21 and 12.31 below). It is crucial to exploit 
properties of the random variables at hand in order to express the error 
bounds in terms of simple and manageable expressions. To achieve this, we 
will explore what the abstract key conditions are that allow for a success- 
ful implementation of Stein's method for normal approximation — a question 
that has not yet been addressed in the literature. Our framework provides 
such a set of conditions along with a variety of "plug-in" type theorems. 
Not only does our framework show the connection between all the above 
mentioned approaches, but it also introduces some crucial generalisations 
and hence flexibility into Stein's method for normal approximation. 

Our main tool is that of couplings; more specifically, we introduce the new 
concept of Stein couplings. We provide general approximation theorems with 
respect to the Wasserstein and Kolmogorov metrics, where the error terms 
are expressed in terms of the relationship between the involved coupled ran- 
dom variables. Although we relate the different known approaches via Stein 
couplings, our applications also show that the distinction usually made be- 
tween these approaches is rather artificial. Most of the couplings used in our 
applications cannot be clearly assigned to one of the known approaches, but 
emerge naturally from the problem at hand and are therefore constructed in 
a more ad-hoc way. Nevertheless, in Section [3] we make an attempt to give 
a systematic overview over the different coupling constructions, being very 
well aware of the fact that other ways of classifying these couplings may be 
equally reasonable. 



1.1. A short introduction to Stein's method. We assume throughout 
this article that If is a random variable whose distribution is to be approx- 
imated by a standard normal distribution and we also assume that W has 
finite variance. Stein's method for normal approximation is based on the 
fact that, for all, say, Lipschitz continuous function / we have 

nZf(Z)} = Ef'(Z) (1.1) 

if and only if Z ~ N(0, 1). Now, if it is the case that 

E{Wf(W)} » Ef'(W) (1.2) 

for many functions /, we would expect that W is close to the normal distri- 
bution. With Stein's method we can make this heuristic idea rigorous in the 
following sense. Assume that, in order to measure the closeness of Jz?(W) 
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and Jzf(Z), we would like to bound 

Eh(W)-Eh(Z) (1.3) {3} 

for some function h (take for example the half line indicators for the Kol- 
mogorov metric). Solving the so-called Stein equation 

f(x) - xf(x) = h{x) - Eh(Z) (1.4) {4} 

for / = /ft, we can then express the quantity f)1.3|) as 

Eh(W)-Eh(Z) = E{f'(W)-Wf(W)} = EAf(W), (1.5) {5} 

where A is the operator defined by Af(x) := f'(x)—xf(x). Hence, EAf(W) 
measures the error in (jl.2p and the function / relates t his error to th e error 
of the approximation Eh(W) « Eh(Z) via (JT3J) (see iRollinl (j2007aT ) for a 
more detailed discussion). 



Let us elaborate (|1.2p more rigorously. One way to express (jl.2p is to 
assume that there are two random variables T\ and T2 such that 

E{Wf{W)} =E{T 1 f(W + T 2 )} (1.6) {6} 

for all /. Equations of the form (|1 .6j) are often called Stein identities, as they 
characterise in some sense the distribution of W. There are two important 
special cases of (|1.6p . If, for ex ample, ^ is a Gauss i an fie ld and W = 



W(ty) a (smooth) functional of it, iNourdin and Peccatil (|2009l ) use Maliavin 



calculus to derive (|1.6p with T2 = and they give a more or less explicit 
ex pression for T-\. In con t rast, in the zero-biasing approach as introduced 
by Goldstein and Reinert ( 19971 ). it is assumed that (|1.6|) holds for a specific 



T2 where T± = 1. 

To illustrate the line of argument to obtain a final bound in its simplest 
form, let us look at the case T2 = 0. We can write 

EAf{W) = E{{l-T l )f{W)} =E{(l-E w T 1 )f'(W)}, (1.7) {7} 

so that the error in the normal approximation is given by 

\Eh(W) - Eh(Z)\ = \EAf(W)\ ^ \\f'\\E\l - E W T 1 \, 



and, if ET\ = 1, the last term is usually further bounded by \/Var E W T\. It 
is not difficult to show that ||/'|| ^ ^\\h\\ (where || • || denotes the supremum 
norm). Hence, we obtain the final bound 



\Eh{W) - Eh{Z)\ ^ 2\\h\y\wE w T 1 . 

Note that this specific form of the bound involving Var lE^Ti has been 
implicitly used in the literatu re around Stein's method many times, bu t 
pr obably first made expl icit by Cacoullos. Papathanasiqu. and Utev ( 19941 ). 



In lChatteriel (120091 ) and lNourdin. Peccati. and Reinertl (|2009l ') a connection 



with Poincare inequalities was made, where a bound on Var E T\ is called 
a Poincare inequality of second order. 
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1.2. An outline of our approach. Roughly speaking, we propose a gen- 
eral, probabilistic method of deriving identities of the form (jl.6p and more 
refined versions of it, where we will make use of both random variables T\ 
and T2 as we will show below. Based on this, we provide theorems to obtain 
bounds for the accuracy of a normal approximation of W . 



Th e key idea is that of auxiliary randomisation, introduced by IStein 



(1986). To this end we construct a random variable W which we imag- 



ine to be a 'small perturbation' of W. It is important to emphasize that we 
make no assumptions about the distribution of W or the joint distribution 
at this point (in particular no exchangeability is assumed a priori) . We also 
need to note that we never attempt to couple W and W so that W = W 
almost surely; in fact, such a coupling would contain no useful inform ation 
for our pu rposes. It is crucial that some randomness remains (see Rosd 



( to appear! ) for a discussion on how to optimally choose the perturbation in 



some examples using exchangeable pairs) and it will become clear that the 
difference D := W' — W contains essential information about W. This can 
be seen as a local-to- global approach, where we deduce global properties of 
W from the behaviour of local perturbations, as these are often easier to 
handle. 

When dealing with Stein's method, it becomes clear that we cannot expect 
normal approximation results from any arbitrary coupling (W, W), and we 
need to impo se some struc ture. To achieve this, we generalize an idea which 
goes back to ISteinl (|1972I l and introduce a third random variable G. For 



reasons that will hopefully become apparent in the course of this article, we 
then make the following key definition. 

Definition 1.1. Let (W, W, G) be a coupling of square integrable random 
variables. We call (W, W, G) a Stein coupling if 

E{Gf(W) - Gf(W)} = V{Wf(W)} (1.8) {8} 

for all functions for which the expectations exist. 



Before explaining how this will help in finding an identity of the form (jl.6p , 
let us first discuss some standard Stein couplings. As a simple example where 
(jl.8p holds, assume that (W, W) is an exchangeable pair and assume that, 
for some A > we have 

E W (W' -W) = -AW. (1.9) {9} 

If we set G = ^(W - W) it is easy to see that (|1.8p is satisfied (see Sec- 
tion (37TJ for more details). Equat ion (11.91) i s the w ell- known linear regression 



condition introduced by Stein in Diaconis ( 19771 ) and Stein (1986) and (jl.8 



can be seen as a generalization of it. We will show in Section [3] that (jl.8p 
is the key to normal approximation using Stein's method and that many 
approaches in the literature in fact (implicitly) establish (jl.8p . 



Remark 1.2. Let (W, W',G) be a Stein coupling. If we choose f(x) = 1, 
we see from (jl.8p that JEW = 0. If we choose f{x) = x we furthermore have 
that E(G-D) = VarW. 

Note that the statement 1E(GD) = Var W is well known in a special case. 
If W is an independent copy of W, then (W,W, (W — W)/2) is a Stein 



STEIN COUPLINGS FOR NORMAL APPROXIMATION 



6 



coupling (use the exchangeable pairs approach above with A = 1) and, hence, 

Var W = \E{W - W'f = E(GD) 

is the well known way to express the variance of W in terms of two indepen- 
dent copies. However, this coupling is not useful for our purpose as \W — W\ 
is not 'small'. This example also shows that a Stein coupling by itself does 
by no means guarantee proximity to the normal distribution. 

It is often not difficult to construct a 'small perturbation' W' of W. Three 
main techniques have been used in the literature: deletion, replacement and 
duplication (note, however, that this is often done implicitly and not ex- 
pressed in terms of couplings). In many typical situations, W is a functional 
of a family of random variables X\, . . . , X n and W' can be constructed by 
picking a random index I independently of the X j and then by perturbing at 
position I, either by removing Xj, replacing it, or by adding another, related 
random variable. If the Xi are not independent, the other random variables 
typically have to be 'adjusted' appropriately. Let us quickly illustrate the 
three techniques in the most simple situation, that of a sum of independent 
random variables. Let W = Yli=i where the Xi are assumed to be in- 
dependent, centred and such that VarXj = 1/n. Let in what follows I be 
uniformly distributed over {1, . . . , n} and independent of all else. 

Deletion. Define W = W — Xj, that is, remove Xj from W. If we choose 
G = —nXj, we have 

n 

E{Gf(W')}=-Y,M x if( w - x i)} = ° 
i=i 

due to the independence assumption. Further, 

n 

-E{Gf(W)} = J2®{*if(W)} = E{Wf(W)}, 
i=i 

so that, indeed, (jl.8p is satisfied. This construction is very powerful under 
local dependence, but it can also be used in other contexts; see Section 14.11 

Replacement. Let X[, . . . , X' n be independent copies of the Xi. Define W' = 
W — Xi + X'j. Then, it is not difficult to see that (W, W) is an exchangeable 
pair and that (|1.9p holds with A = 1/n, which corresponds to G = ^(X'j — 
Xj); this implies (|1.8f> . The idea of replacing (or re-sampling) is one of the 
most fruitful in Stein's method and will often lead to an exchangeable pair 
(W, W') and can be applied in situations where the functional is no longer 
a sum or where some weak global dependence structure is present. It lends 
itself naturally if W can be interpreted as the state of a stationary Markov 
chain and W' is a step ahead in the chain; this observation was first made 
explicit by Rinott and Rotar / ( 19971 ). It is important to note here that the 



choice of G is by far not restricted to be a multiple of (W — W), also if 
(W, W) is exchangeable. This added flexibility is one of the key observation 
in this article. Note also that the X[ need not be copies of the Xi and may as 
well have a different distribution, so that (W, W) need not be exchangeable. 
In the size-biasing approach, indeed, the X[ will typically be chosen to have 
the size-biased distribution of Xj. 
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D uplication. To the best of our knowledge, this method has been only used 
bv lChenl (jl99Sft . Let the be as in the previous paragraph, and let W' = 



W + Xj along with G = n(X'j — Xj). Due to symmetry we have 

E{X' I f(W')} = TE{X I f(W')}, 

hence E{Gf(W')} = 0. Further, E{X' T f(W)} = 0, so that E{Gf(W)} = 
E{W/(W)} and ([111) follows. 

As can be readily seen from this example, there are typically many pos- 
sible ways to construct Stein couplings and it depends on the application 
which perturbation will give optimal results. Typically, two of the three 
random variables W, W and G are easy to construct (usually (W, W) or 
(W, G)) and the challenge lies then in constructing the third random variable 
to make the triple a Stein coupling. However, as we will show in Section [3l 
by making abstract assumptions about the structure of W, many situations 
can be handled by standard couplings, so that, in a concrete application, 
one often only needs to concentrate on constructing the coupling satisfying 
some abstract conditions rather than to find a Stein coupling from scratch — 
although the latter can give interesting additional insight into the problem 
at hand and may also lead to improved bounds. 

We need to emphasize at this point that our abstract theorems will also 
hold if (|1.8p is not satisfied, so that — a priori — any coupling (W, W, G) can 
be used. However, useful bounds can only be expected if (|1.8p holds at least 
approximately and the accuracy at which (|1.8p holds enters explicitly into 
our error bounds. This parallels the introd uction of a remainder term in 
Condition (fL9|) bv lRinott and Rotarl (|l997l . Eq. (1.7)). 



Let us now go back and show how a coupling (W, W, G) helps in obtaining 
a Stein identity of the form (|1.6p . By the fundamental theorem of calculus 
we have 

f{W')-f{W)= f'(W + t)dt, (1.10) {11} 

Jo 

so that, multiplying (jl.lOp by G and taking expectation, we obtain 

B{Gf(W') -Gf(W)} = E | G ^ f'(W + t)dt\. (1.11) {12} 

If U is an independent random variable with uniform distribution on [0, 1] 
we can also write this as 

E{Gf(W')-Gf(W)} = E{GDf'(W + UD)}. (1.12) {13} 

If (W,W',G) is a Stein coupling, the left hand side of (|1.12p equals to 
E{W/(W)} and, hence, (TSJ) is satisfied with T x = GD and T 2 = UD. Our 
generality comes at a cost: as is clear from (jl.6p . if T 2 is non-trivial we can 
not easily condition T\ on W (or an appropriate larger u-algebra) as done in 
(jl.7p — an important step in the argument. However, using a simple Taylor 
expansion, we will see how to circumvent this problem. We remark that it is 
usually not necessary to condition T\ exactly on W — it is typically enough 
to condition on a larger cr-algebra. However, some form of conditioning is 
usually necessary (typically averaging over all possible 'small parts' that can 
be perturbed). 
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We note at this point th at there is an infinitesimal version of the pertur- 
bation idea, introduced by Steinl ( 19951 ) and further elaborated by Meckes 
(2008). It is applicable if the underlying random variables are continuous. 
Starting with an identity of the form (jl.6p with non-trivial T\{e) and T2(e) 
(in the original work by means of classic exchangeable pairs) depending on 
some e > 0, a preceding limit argument e — > yields an identity (|1.6j) with 
non-trivial T\ but T2 = 0. 

Having a coupling (W, W , G) at hand one is already in a position to apply 
our theorems and obtain bounds of closeness to normality in the respective 
metric. However, bounding VarE^Ti = Var E W (GD) is not always easy 
and sometimes not optimal as one typically has to make use of (truncated) 
fourth moments of the involved random variables. For this reason it is of- 
ten beneficial — and sometimes crucial — to introduce other auxiliary random 
variables. 

The first extension is to replace conditioning on W by conditioning on 
another random variable W", which is still assumed to be close to W but 
typically independent of GD (but not independent of W and W'\). In this 
case the main error term becomes Vax w "(GD) which of course vanishes if 
W" is independent of GD. Although this comes at the cost of additional 
error terms, these are usually easier to bound. We need to emphasize that 
the distribution of W" is — a priori — irrelevant and no corresponding equa- 
tion of the form (jl.8|) has to be satisfied for W"; the only important feature 
is independence from GD and closeness to W. In fact, we can show that, if 
(W, W, G) is a Stein coupling and W" is independent of GD, then we can 
construct T3 and T4 such that 



E{Wf(W)} = Ef'(W) + E{T 3 f"(W + T 4 )} 



for all smooth enough functions /. Compared to (|1.6p . this is clearly a step 
further towards (jl.ip . Typically, the specific dependence structure in W 
used to construct W" independently of GD can also be exploited to calculate 
bounds on Var H W (GD) — so why introducing W" in the first place? The 
crucial advantage is that the dependence structure can be exploited in a 
more direct way (that is, in an earlier stage of the proof), avoiding forth 
moments — and constructing W" is often easier than bounding Var E W (GD). 

Other improvements can be made in specific applications by replacing 
D by a random variable D such that M W (GD) = E, W (GD) (note that we 
use the same letter only for convenience; D itself does not have to be the 
difference of two random variables) and replacing 1 in (|1.7p by a random 
variable S such that E^S = 1. Smart choices of D and S may allow us 
to construct W" to be closer to W in order to improve or simplify the 
error bounds. As we will see in Section 13.2.21 about decomposable random 
variables, the use of D can be crucial. However, at first reading one may 
always assume that W" = W, D = D and S = 1. 

We do not claim that all results that have been obtained using Stein's 
method for normal approximation can be represented in terms of these aux- 
iliary random variables; but we provide evidence that we can cover a large 
part of them. It may be possible to setup an even more general framework by 
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introducing other auxiliary ran dom variables so that even very specialised re- 
sults such as ISunklodad ((2008 L where very fine and delicate calculations are 
necessary to obtain optimal rates, could be represented in such a framework. 
We did not attempt to do so in order to keep our theorems mana geable. 

As mentioned after (|1.6p . zero-biasing couplings, as introduced in lGoldstein and Reinert 
( 19971 ). are the special case of (|1.6p where T% = 1. It is thus not surpris- 
ing, that we are not able to directly represent a zero-bias coupling as a 
Stein coupling. However, we will show that each coupling (W, W',G) sat- 
isfying (|1.8p gives rise to a zero-bias construction (but, unfortunately, the 
construction doe s not directly lead to a couplin g with W). Based on an ex- 
changeable pair, iGoldstein and ReinertI (|2005bh propose a way to construct 
the zero bias W z . We adapt this construction to our more general setting. 
As such, the zero-bias approach parallels our approach rather than being a 
special case of it. Therefore, in this articlej we t ake a different point of view 
than, f or example, Goldstein and ReinertI ( 1997 ) and Goldstein and ReinertI 
(2005a|), where size and zero biasing are seen to be closely related (both 
are distributional transformations). From our perspective, size biasing is 
closer related to approaches such as local approach and exchangeable pairs 
approach and we think of zero biasing as being separate from these. 

The rest of the article is organized as follows. In the remainder of the 
introduction we will introduce the metrics of interest. In Section 2 we will 
present the main theorems of the article and discuss the crucial error terms. 
In Section 3 we will show how known approaches fit into our framework 
and also present and discuss new couplings. Section 4 is dedicated to some 
applications in order to see different couplings in action. In Section 5 we 
will make the connection with the zero-bias approach and in Section 6 we 
will prove the main results from Section 2. 



1.3. The probability metrics. For probability distribution functions P 
and Q define 

/oo 
\P(x) - Q(x)\dx, d K (P,Q) = sup \P(x) - Q(x)\. 
-oo xgR 

The first quantity is known as L\ } Wasserstein or Kantorovich metric and is 
only a metric on the set of probability distributions with finite first moment. 
If X ~ P and Y ~ Q have finite first moments and if J-~\y is the set of 
Lipschitz continuous functions on 1R with Lipschitz constant at most 1, we 
have 

d w (P,Q) = sup \Eh(X) - Eh(Y)\, 

where the infimum ranges over all possible couplings of X and Y. The 
second metric is known as Kolmogorov or uniform metric and if Ty. denotes 
the set of half line indicators we obviously have 

d K (P,Q) = sup \m(X) - Eh(Y)\. 

heT K 

If <p is a right continuous function on 1R such that for each e we have 



Q(B £ ) < Q(B) + ip(e) 
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for all Borel set BcR, where B e = {x : inf^ g £ \x — y\ ^ e}, then 



d K (P,Q) ^ Vdw(P,Q) + v(dw(P,Q)) 

(see IGibbs and Sul |2002) for a compilation of this and similar results). If 
Q = N(0, 1) is the standard normal measure we can take <p(x) = xy / 2/n, 
thus 



d K (P,N(0,l)) < 1.35^(^(0,1)). (1.13) {14} 

However, as is well known, this bound is often not optimal and, in fact, in 
many situations both metrics will exhibit the same rates of convergence. 



2. Main results 

Throughout this section, let all random variables be at least square in- 
tegrable and defined on the same probability space without making any 
further assumptions unless explicitly stated. Again, we point out that at 
first reading one may set W" = W, D = D and S = 1. It may also be 
helpful to keep in mind the introductory couplings for sums of independent 
random variables in Section Tl .21 Before stating our main theorems, we first 
define and discuss some error terms in order to express the overall error 
bounds in the different metrics and using different techniques. Let 

r = sup \M{Gf(W')-Gf(W)-Wf(W)}\, (2.1) {15} 

ll/IUI/'IKi 

where the supremum is meant to be taken over all function / which are 
bounded by 1 and Lipschitz continuous with Lipschitz constant at most 1. 
We need to point out that in the proofs the actual supremum is only taken 
over the solutions to the Stein equation so that in cases where more prop- 
erties of / are needed (such as better constants) this can easily be accom- 
plished. Clearly, if (W, W',G) is a Stein coupling, then r$ = 0. Hence, for 
the couplings and examples discussed in this article the actual set of func- 
tions over which the supremum is taken is not rel evant. In cases wh e re the 
line arity conditi on is not exactly satisfied (see e.g. Rinott and Rotarl ( 19971 ) 



and lShaol f|2005l ^ ro measures the corresponding error; IShaol (|2005) handles 



self-normalised sums where uniformly bounded derivatives of / are needed 
to proof that (the implicitly used) ro is small. Let now 

D :=W' — W, D' := W" - W, 

let D be a square integrable and let S be an integrable random variable on 
the same probability space. Define 

n = ^\E W {GD - GD% r 2 = E|E w (l-S)|, r 3 = B\B W " {GD - S)\. 

Clearly, r\ is the error we make by replacing D by D, r<i the error we make 
by replacing 1 by S and r% corresponds to the main error term as discussed 
in the introduction. These error terms will appear irrespective of the metric. 
Additional error terms which are specific to the metric of interest will be 
defined in the respective sections. 
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2.1. Wasserstein distance. Bounds for smooth test functions are typi- 
cally easier to obtain as no smoothness of W is required. Let us first start 
with a very general theorem, from which we will then deduce some simpler 
corollaries. 

Theorem 2.1. Let W , W' , W", G and D be square integrable random 
variables and let S be an integrable random variable. Then 

d w (^W,N(0,l)) 

< 2r + 0.8ri + 0.8r 2 + 0.8r 3 + 1.6r 4 + r 5 + 1.6r 4 + 2r' 5 , 

where 

r 4 = 1E\GDI[\D\ > 1]|, r' 4 = E|(GL> - S)I[\D'\ > 1]|, 

r 5 = M\G(D 2 A 1)|, r' 5 = B\{GD - S)(\D'\ A 1)|. 



Note that, here and in later results, the truncation constant 1 is not 
chos e n arb itrarily as the truncat i on at 1 has some optimality properties; see 
Lohl (|1975T ) and IChen and Shad (|200lh . The difference \GD — S\ in r' 4 and 



r' 5 can usually be replaced by \GD\ + \S\ without much loss of precision. 
Under additional, but not too strong assumptions we have the following. 

Corollary 2.2. Let (W,W',G) be a Stein coupling with V&rW = 1. Then, 
under finite fourth moments assumption, 



d w (jSf(W),N(0,l)) < 0.8yVarE H/ (GL>) + E|GL> 2 |. (2.2) {16} 

Proof. Theorem O with S = 1, D = D and W" = W yields the result, but 
with bigger constants. However, with the assumption of finite fourth mo- 
ments no truncation is necessary, and one can obtain the better result (|2.2p 
with essentially the same proof as for Theorem 12.11 □ 

Corollary 2.3. Let (W,W',G) be a Stein coupling with Variy = 1 and 
assume that there are S and D such that E, w S = 1, E, W (GD) = 1E W (GD) 
and W" independent of (GD, S). Then, under finite third moments of W , 
W, G and D and E|5| 3 / 2 < oo ; 

d w (-#(W),N(0,l)) ^ E|GD 2 | +2E 1 \GDD'\ +2E\SD'\. 

Using Holder's inequality, we obtain the following straightforward simplifi- 
cation which gives the correct order in 'typical' situations; we assume S = 1. 

Corollary 2.4. Let (W,W',G) be a Stein coupling with VarlU = 1 and 
assume that there is D such that 1E W (GD) = E, W (GD) and W" independent 
of GD. Then, if 

E|L>| 3 VE|Z)| 3 VE|L>'| 3 < A 3 , E|G| 3 < B 3 , (2.3) {17} 

for some positive constants A and B, we have 

d w (jSf(W), N(0,1)) < 5A 2 B. (2.4) {18} 
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Proof. The only part which may need explanation is that E(G-D) = 1 be- 
cause VavW = 1 by assumption and hence 1 ^ E|GD|. This yields 

E|-D'| < E|G-D|E|D'|. 

The remaining part is due to Holder's inequality. □ 

2.2. Kolmogorov distance. Let us now consider bounds with respect to 
the Kolmogorov metric. To obtain such results, all approaches using Stein's 
method will estimate probabilities of the form 

P[A s$ W s$ B\ F] (2.5) 

at some stage of the proof, where A and B are ^-measurable random 
variables. This is in order to control the smoothness of W as we are 
now dealing with a non-smooth metric. Essentially two techniques can 
be found in the literature to deal with such expressions. In the first ap- 
proach, bounds on (|2.5p are established by using similar techniques as used 
for Stein's method, i.e. Identity (jl.lip is applied for special functions / to 
obtain an explici t bound on (12.51). We call this the concent r ation inequal- 
ity ap pr oach; see iHo and Chenl (I1978Il I Chen and Shad (|2005h . IShao and Sul 
(2006), Chatteriee. Fulman. and Rollinl (|2006l ). In the second approach, 
(|2.5p is bounded in an indirect way in terms of (1k(^'{W\J 7 ), N(/x^r, Ojr)) 
where [ij and o~jr are the conditional expectation and variance, respec- 
tively, of W (see Lemma 16.21 in Section [6]) which will lead to some form 
of recursive inequality; hence we will call this the recursive approach. Such 
inequalities are either so lved direct l y, i.e. if J- is the trivial c-algebra (see 
Rinott and Rotarl (119971 ) and iRaid feooah ). or by using an inductive argu- 
ment, making use of some additional s tructure in W; thi s is typically the 
case if J 7 is a non-trivial cr-algebra (see Bolthausenl ( 19841 )). The recursive 
approach has the advantage that an explicit bound of (|2.5p is not needed. 

This comes at cost of more structure in the coup ling. 

As has b een observed by many aut hors, such as lRinott and Rotar' (|19971 l 
Rail (120031 1 or lChen and Shad (|2005h in the context of Stein's method, it is 
easier to obtain Kolmogorov bounds under some boundedness conditions, in 
which case the recursive approach can be easily implemented, in fact, with 
T equal to the trivial <r-algebra. Using a truncation argument, bounded- 
ness can be relaxed, but in order to obtain useful results one will need fast 
decaying tails of G, D, D and D' . We will use mainly this approach for our 



applic ations; see also for example IShao and Sul (|2006l ) and IChatteriee et al. 
(|2006I l 



Theorem 2.5. Let W , W , G, D, W" be square integrable random vari- 
ables and S be an integrable random variable. Then, for any non-negative 
constants a, f3, (3 1 , (3 and 7 

d K (^W,N(0,l)) 

^ 2(r + n + r 2 + r 3 + r 6 + r' 6 + (a/3 + 7 )(E|W| + 5)/3' + (E\W\ + 3)a/3 2 ). 

where 

r 6 = E|GDI[|G| > a or \D\ > 0\\, 

r' 6 = E\(GD - S)I[\G\ > a or \D\ > or \D'\ > or \S\ > 
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If a sequence of couplings (W n , W w G n , W", D n , S n ) n ^i is under consider- 
ation, the truncation points a, /3, (3' , j3 and 7 will of course need to dependent 
on n. In a typical situation, say a sum of n bounded i.i.d. random variables, 
we will have a x n 1//2 , /? x /?' x /3 X n -1 / 2 and 7x1. 

Corollary 2.6. Lei (W, W', G) be a Stein coupling with \&rW = 1. 7/ G 
and -D are bounded by positive constants a and j3, respectively, then 



d K (^?(W),N(0,l)) < 2yVarE M/ (GL>) + 8a/3 2 

Note that, even if G and L> are bounded, we unfortunately cannot deduce 
a direct, useful bound on Var Wj(GD) from that fact. Instead, we need 
again more structure in order to avoid Var E (G.D). 

Corollary 2.7. Let (W, W',G) be a Stein coupling with V&rW = 1 and 
assume that there are D such that M W (GD) = E (GD), S such that 
IZ W S = 1 and W" independent of (GD,S). If the absolute values of G, 
D, D, D' and S are bounded by a, j3, f3, f3' and 7, respectively, then 

d K (^(W),N(0, 1)) < 8a(3 2 + 12q/3/3' + 12 7 /3'. 

We need to emphasize the remarkable statement of Corollary 12.71 under 
the conditions stated we immediately obtain a bound on the Kolmogorov 
distance to the standard normal without any additional computations! Ex- 
amples can easily found such as bounded, locally dependent random vari- 
ables; see Section [321 



The approach we will use for the next theorem was developed bv lChen and Shao 



( 20041 ) for locally dependent ran dom variables. Althou gh a concentration 



inequality approach was used bv IChen and Shad (|2004l ). the recursive ap- 



proach is easy to implement without loss of precision. Like in Theorem 12.51 
the aim is to obtain a bound involving (|2.5|) with respect to the uncondi- 
tional W. This conies at the cost of truncated forth moments, especially in 
the form of r§. Hence, the approach of avoiding truncated forth moments 
by making use of W" in r% will not be useful because of the presence of rs- 
Therefore, we give below only a version for W" = W, D = D and 5 = 1 
to avoid unnecessary overloading of the bound. To define some additional 
error terms, let 

K(t) := G(I[0 ^ t < D] - l[D ^ t < 0]), (2.6) {20} 

K w (t) := E, w K(t), K{t) := MK(t). 

Theorem 2.8. Let W , W' and G be square integrable random variables on 
the same probability space. Then 

d K (j2W),N(0,l)) < 2r + 2f3 + 2r 4 + 2(E|VF| +2.4)r 5 

+ 1.4r 7 + 2((E(|W| + 1) 2 ) 1/2 + l.l)r 8 

where r% = E|E W/ (GD) — l|, where r^ and r*, are defined as in Theorem \2.1\ 
and where 

r 7 = [ V&rK w (t)dt, r 8 = ( [ \t\ Var K w (t)dt) ' . 
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Let us discuss VavK w (t). Typically, W will consist of n parts, such as 
a sum of n random variables or a functional in n coordinates. In this case, 
the perturbation W' will typically be constructed by picking a small part of 
W, say part /, where I is uniform on {1, . . . , n} and then perturb this part. 
Thus, we typically will have G := nYi, D := Dj for sequences Y\, . . . ,Y n 
and D±, . . . , D n and then define W := W + Di, where, with a 2 = Var W, 
E\Yi\ = CKo-" 1 / 2 ) and E|A| = 0((j- 1 / 2 ). Let now (W*, W*', G*) be an 
independent copy of (W, W, G) and let if (x) = I[0 < t < x] - l[x < t < 0]. 
Then we can write 



VarK^(t) ^ J2 Cov(Ylt(D i ),Y 3 lt(D J )) 

n 

= ^ ®{Y i Y j Tt(D i )Tt(D j )-Y i Yptm'1?(Dj)} 



so that 



and 



r 7 < E{y<^-I[AI>i > 0](|A| A \Dj\ A 1) 
i,3=l 

- YiY*l[DiD* > 0](|A| A |D;| A 1)} 



1 

rf < - V EiYiYjllDiDj > 0](|A| 2 A |^| 2 A 1) 



2 ^ 



YY*l[DiD* > 0](|A| 2 A |A]| 2 A 1)}, 



respectively. In the case of l ocal dependence , these quantities can now be 
bounded relatively easily; see I Chen and Shad (|2004h . 



If truncated fourth moments are to be avoided and no boundedness can be 
assumed, it seems that more structure is needed in the coupling. A typical 
insta nce is the use of highe r-order neighbourhoods under local dependence 

or the recursive structure in the combinatorial 



as m 



Chen and Shao 



CLT in Bolthausen ( 



1984). The theorem below is the basis for such results 



and it contains expressions of the form (|2.5p explicitly, so that further steps 
are needed for a final bound. 

Define for a random element X defined on the same probability space 
as W the quantity 

$ e (X) = sup P[o < W ^ a + e I X], 

where we assume without further mentioning that the regular conditional 
probability exists. 

Theorem 2.9. Let W , W , W" , D and G be random variables with finite 
third moments and S be a random variable with E|5| 3 / 2 < oo. Then, for 
any e > 0, 

dK(j2W,N(0,l)) 

< ro + n + r 2 + r 3 + r 9 + 0.5ri + e -1 rn(e) + 0.5£ -1 ri 2 (e) + 0.4e 

(2.7) 
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where 

r 9 = ]S\(S -GD)(\W\ + 1)(\D'\ A 1)| r w 
r u (£) = -EUS-GD)D%(G,D,D',S)\ r l2 {e) 



E|G(|W| + 1)(D 2 A 1) 
1&\GD 2 $ £ (G,D)\ 



We now look at a method to obtain a final bound from the above theorem 
using induction. Although never mentioned in the li terature ar o und S tein's 
method, this type of argument can be traced back to Bergstrom ( 19441 ). who 
uses Lindeberg's method and an inductive argument t o prove a Ko l mogo rov 
bound in the CLT. The argument was used later by Bolthausen ( 19821 ) in 
the context of martingale central limit theorems. The following Lemma [2.10l 
provides the key element t o the induct i ve app roach in the context of Stein's 
method as introduced by iBolthausenl ( 19841 ). It can be used to obtain a 



final bound from an estimate of the form (|2.7p . provided that W = W n 
has some recursive structure and S e can be expressed in terms of the close- 
ness of W%, W2, • • • , Wn-i to the standard normal distribution. Note that 
in the following lemma, the numbers k^, k = 1, . . . , n denote the respective 
bounds on th e Kolmogor o y dis tance between and the standard nor- 
mal. Whereas Bolthausen (1984) uses a recursion involving K n _4, . . . , K n _i, 
Goldstein! (|201(F introduces a version involving all possible K\, . . . , K n —i to 
prove Berry-Esseen type bounds for degree counts i n the Erdos - Reny i ran- 
dom graph using size biasing. Incidentally, already Bergstrom ( 19441 ) uses 
Ki, . . . , K n -i for his inductive argument, although his argument is of a some- 
what diffe r ent fl avour. The following lemma is inspired by the work of 
Goldstein] ( 20ld ). but adapted to be used along with Theorem 12.91 An 
independent proof will be given in Section [6) 

Lemma 2.10. Let k\, . . . , K n , be a sequence of non-negative numbers such 
that ki ^ 1. Assume that there is a constant A ^ ; a triangular array 
A^^, . . . , A^^ ^ 0, k = 2,3, ... ,71, and a sequence 02, ■ ■ • , cr n > such that, 
for all e > and all 2 ^ k ^ n, 

k-i 



A 1 ^ „ 

Kk sC — + 0.4e + > A k 



0~k 



ecr k 



(2i 



1=1 



Then, 



where 



1 (5(A V 1) + 2a n + a' n ) (2a n + a' n ) 



k-l 



5a' n 



a 



n= sup V — A k h 



2<k<n 



1=1 



a' n = y / 2a„(2a n + 5p V 1)). 



Example 2.1. Let W n = n~ x l 2 YTi=i x i where X { are i.i.d. with EX, = 
and VarX, = 1 and E|X;| 3 = 7 > 1. Let K n = d K (^(W n ), N(0, 1)) . Set 
G = -n 1 / 2 ^/ and W = W - n^X/. Set also W" = W, D = D and 
5 = 1. Hence D = D' = -n" 1 / 2 ^. We have 



Furthermore, 



ro = r% = r 2 = r 3 = 0. 
r 9 sC 67/Vn, rio ^ 37/ 
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Note now (c.f. Lemma l6.2p 
$ £ (Xn) = su P F[a < W n < a + e\X n ] 



= supPy^ I a^W n - 1 + ^ 1 X n ^^ l a + ^ 1 e+ X n 
hence 

rn(e) < 2j(e + 2« n _i)/ v / n, ri 2 (e) ^ 7(e + 2K n _i)/ v / n. 
Putting these estimates into Theorem 12.91 we obtain 

87 57 

We can apply Lemma 12,101 with A = 87, -A/fe jfe— 1 = 57 and ; = for 
I < k — 1, and cr^ = fc -1 / 2 . We have a n = 57\/2, thus, plugging this 
into (|2.8p . fc n ^ 25 r y/s/n. As this example illustrates, the constants obtained 
this way are typically not optimal, but nevertheless explicit. 



3. Couplings 

In this section we present some well-known and some new couplings and 
show how they can be represented in our general framework. The basis 
is always the coupling (W,W',G) and throughout this section (with the 
exception of classic exchangeable pairs) we will only look at cases of actual 
Stein couplings, that is, where ro = 0. This implies in particular that 
~EiW = (which, nevertheless, has to be assumed explicitly in some cases 
to make the construction work in the first place). Unless otherwise stated, 
the variance a 2 of W is arbitrary, but finite and non-zero. Note that, if 
(W,W',G) is a Stein coupling, so is (W/a, W'/a, G/a), and hence we will 
usually omit the standardising constant cr _1 for ease of notation. To simplify 
or optimize the bounds, we sometimes will extend the coupling by different 
choices of D, W" and S. But, if not otherwise stated, we will make the basic 
assumption throughout this section that D = D, W" = W and 5=1. 

We mostly present the construction of the couplings only and not the 
particular form of the final bounds for the normal approximation. The rea- 
son for this is that, once the coupling is constructed, one can directly apply 
our theorems or corollaries of the main section to obtain the correspond- 
ing bounds. Hence, stating them explicitly would be either just repeating 
known results from the literature or rephrasing the results from the main 
section. 

We need to clarify again that a Stein coupling by itself does by no means 
imply closeness to normality or imply any convergence. As can be seen from 
the case of quadratic forms (Section I3.2.3j) . Stein couplings as defined by 
(|1.8p can also be used for x 2 approximation. 

Let throughout this part [n] := {1, 2, . . . , n} and [0] := 0. Let also in gen- 
eral I and J be independent random variables, uniformly distributed on [n] 
and independent of all else, but we will usually mention this — and deviations 
from it — explicitly whenever we make use of these random variables. 
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3.1. Exchangeable p airs and e xten sions. This approach was introduced 
by Stein in a p aper by Diaconisl ( 19771 ). A systematic exposition was given 
bv lSteinl (|l986l h 

Construction 1A. Assume that (W, W') is an exchangeable pair. If, for 
some constant A > 0, we have 



E W (W -W) = -XW, 
then (W, W,±(W - W)) is a Stein coupling. 



(3.1) {23} 



Rinott and RotaPI ( 19971 ) generalised (|3.1|) to allow for some non-linearity 



in (|3.1|) ; however, the resulting coupling will only be an approximate Stein 
coupling. 

Construction IB. Assume that (W, W) is an exchangeable pair where 
W*W = and Y&rW = 1. Assume that, for some constant A > 0, we have 

1E W (W'- W) = -XW + R. (3.2) {24} 

then, with G = ±(W - W)), 

r < X-^Rl, \E(GD) - 1| ^ A~ 1 |E(W A J R)| s$ A^VVari?. 

(note that we use Y&rW = 1 only to obtain the last two inequalities). 

The conditional expectation can of course always be written in the form of 
(|3.2p for any A. However, we will need A -1 v^Var R — > to obtain convergent 
bounds, and in this sense the choice of A is, at least asympt o tically, unique; 
see the discussion in the introduction of Rei nert and Rollinl (|2009ah . 

Note that we call this approach 'classic' for this specific choice of G. 
There are many other ways to construct Stein couplings where (W, W) is 
exchangeable but G is not a multiple of W' — W; we will give such examples 
later on. 

The classic exchang eable pairs approach is fr e quently used in t he liter- 
ature; se e for example Rinott and Eotarl (|l997f ). iFulmanl (|2004aT l. Fulman 
(j2004bh . iBollinl (l2007bT ). iMeckesI foOOd ) and others. Generally, one con- 
structs a "natural" exchangeable pair (W, W) and then hopes that (|3.2[) 
holds with R = or R small enough to yield convergence. However, more 
often than not, this will not succeed, even for si mple examples as the 2-run s 
examples below illustrates. Based on work by Reinert and Rollin ( 2009al ). 
we will present in Section 13.1.11 Stein couplings making use of a multivariate 
extensions of (|3.ip which will lead to appropriate modifications of G such 
that (|1.8p holds. In Sections 13.2.41 and 13.41 we will also present two very gen- 
eral couplings that are based on exchangeable pairs, but where G is chosen 
rather differently. 

A few more detailed remarks about this app roach are appr opriate here. 
For this specific choice of G = (W - W)/2X, iRollinl (l2008al ) proves that 
exchangeability is actually not necessary to prove a result such a s The- 
orem [231 as long as we have equal marginals J5f(W) = J£{W\ iRollin 
(I2008al 7 uses a different way of deducing a Stein identity of the form (jl.lip . 
With F(w) = Jq f(x)dx one obtains from Taylor's expansion that 

F(W ) - F{W) = Df(W) + D (1 - s/D)f'(W + s)ds, 

Jo 
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so that, again with G = D/2X, and assuming «£?(W') = j£f(W), 



E{G/(^)} = e|g^ (l-s/D)f(W + D)ds\, (3.3) {25} 



which serves as a replacement for In contrast. 1 Stein! (1986) uses the 

antisymmetric function approach. If (W, W) is exchangeable then E{(VF' — 
W)(f(W) + f(W))} = 0, and it is not difficult to show that 

E{G/(W)} = ^e{g J° f{W + D)ds\. (3.4) {26} 

Note that this is almost f|3.3[) except that the factor (1 — s/D) is replaced 
by 1/2. Note again that (|3.4p is only true under exchangeability whereas 
(|3.3p holds for equal marginals. Surprisingly, better constants can be ob- 
tained if (|3.3p is used instead of (|3.4p , although exchangeability is a stronger 
assumption. Incidentally, in Section 14.41 a coupling is used which is not an 
exchangeable pair but has equal marginals, however, in the context of a 
different construction than Construction [TAl 



3.1.1. Multivariate exchangeable pairs. In iReinert and Rollinl (|2009ah . the 



classic exchangeable pairs approach was generalised to d-dimensional vectors 
W = (Wi, . . . , W d ) and W' = (W{,..., W' d ) which satisfy 

1E W (W' -W) = -AW (3.5) {27b} 

for some invertible (d x d)-matrix A. They are able to obtain multivari- 
ate normal approximation results in cases where the exchangeable pair of 
univariate random variables (Wi, W[) does not satisfy (|3.1|) . but, using aux- 
iliary random variates, an embedding of that pair into a higher dimensional 
space satisfies (|3.5p . However, the transition to higher dimensions comes at 
the cost of having to impose stronger conditions on the set of test functions. 
Hence, besides the multivariate approximation, it is therefore still of inter- 
est to examine W\ directly. It turns out that, once the higher dimensional 
embedding satisfying (|3.5p is found, it is easy to construct a Stein coupling 
from that. 

Construction 1C. Let (W,W) be an exchangeable pair of d- dimensional 
random vectors satisfying (|3,6p for some invertible A. Let be the i-th unit 
vector. Then 

(W^Wl^eiA-'iW'-W)) 

is a Stein coupling. 
Indeed, 

-E{G/(Wi)} = -lEjefA^E^W - W)f{W t )} 

= l-E{e\A- l AWf{Wi)} = ±E{Wif(Wi)}. 

and, using exchangeability, the corresponding result for E{G/(W/)} can be 
obtained in the same way. Hence, every multidimensional exchangeable pair 
(W, W) satisfying (|3.5p gives rise to a univariate Stein coupling for each 
individual coordinate. 

Let us consider the case of 2-runs on a circle To this end, let £1, . . . , £ n 
be a sequence of independent Be(p) distributed random variables. Let V = 
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Sr=ite£*+i ~P 2 ) be the centered number of 2-runs, where we put £, n +i = £1 
(hence 'circle'). Consider now the following coupling. With (,[,■■■ ,£,' n being 
independent copies of £1, . . . ,£ n , let V = V - - + £i-i£j + 

where / is uniformly distributed on [n] and independent of all else. 
It is easy to see that (V, V) is an exchangeable pair and that 



E V (V' - V) 



n n z — ' 

i=i 



71 



•p). 



(3.6) {27} 



Even in this very simple example, the linearity condition (|3.2|) cannot be 
obtai ned with the above natural coupling. Based on the same exchangeable 
pair, iReinert and Rollinl (|2009al ) use the embedding method to circumvent 
this problem. To this end we introduce the auxiliary statistic U = J27=i (& — 
p) and define U' = U — £j + £j. Condition fj3.5j) is now satisfied for W = 
(U,V), W = (U',V) and 



n 



1 

-2p 



The inverse of A is 



A 



-i 



n 



hence Construction I1CI yields 



1 

P 




2 


1/2 



G 



W-u) + \{V-v)) 



(3.7) {27d} 



so that (V, V , G) is a Stein coupling. Exploiting some specific properties in 
this example, we may also choose 



G 



(3.8) {27c} 



to obtain a somewhat simple r Stein coupling, but the si milarity between 
(13.71) and (13.811 is apparent . See Reinert and Rollin ( 2009al ) . IReinert and Rollin 



2009b]) and Ghosh ( 20ld ) for further examples of multivariate exchangeable 



pair couplings. 

3.1.2. Finding W for a given coupling and a given G. In some cases it may 
not be clear from the beginning how to choose the main random variable 
of interest W. Consider the Curie- Weiss model of ferromagnetic interac- 
tion. With (3^0 being the inverse temperature and h £ R the exter- 
nal field on the state space {— 1, l} n , we define the probabilities for each 
a = (<Ti, . . . , o n ) G {—1, l} n by the Gibbs measure 

P[{a}] = Z- 1 exp(^J2 (T i (T j + h J2 a i] ( 3 - 9 ) ^ 28 > 

where Z = Z(J3, h, n) is the partition function to make the probabilities sum 
up to 1. A quantity of interest is the magnetization m(a) = n^ 1 ^2i cr i £ 
[—1,1] of the system. However, in the low-temperature regime the system 
will exhibit spontaneous magnetization so that we may not be interested in 
mia) itself if a is drawn at random according to (|3.9p but in m{a) relative to 
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its corresponding magnetization. To find a suit able correct i on ter m (which 
shall serve here as an illustrative example only) . Ichatterieel ( 20071 ) proposes 
the following construction. 

Construction ID. Let (o~,o~') be an exchangeable pair on some measure 
space and let (p(a,cr') be an anti- symmetric function. Let G = —cp(a,a')/2, 
W = W(a) = W(p(a,a') and W = W{a') = E CT ' \p{a' , a) . Then this defines 
a Stein coupling. 



Indeed, 



and 



-E{Gf(W)} 
B{Gf(W')} -- 



^{^(a,a')f(W)} = ^{Wf(W)} 



■\m{^a')f{W{a'))} 
= -^(a',a)f(W(a))} 
= ±E{<p(a,o-')f(W)} = ±E{Wf(W)}, 

which proves (|1.8|) . 

Let us apply this to the Curie- Weiss model. First, given a is drawn from 
(|3,9p , we define a 1 by choosing a site / uniformly at random and then we re- 
sample this site according to the conditional distribution Jz?(o7 \ o~j,j ^ L), 
giving a new a'j, but leaving all the other sites untouched. Now we set 
(p(cr, a') = n(m{a) — m(a')) = o~j — a'j. It is not difficult to show that 

EV(cr, a') = m(a) - - V tanh(/3mi(cr) + /3h), 
n z — ' 

i 

where irii(a) = ^ Ylj^i a r Hence, we let G = — (07 — <7/)/2, 

W = W(a) = m(a) - - V tanhf/Wo-) + fih), 
n z — ' 

i 

and W := W(a'). As 

tanh(/3m(fj) + /3h) - — tanh(/3mj(cr) + (3h) 



n 



we may alternatively choose W = m(a) — tanh(/3m(<j) + f3h), in which case 
(jl.8p is not satisfied anymore, but we still have ro ^ f3/n. The key here 
is to find (p(<7, a' ) such that E^^cr, a') yields 'something interesting'. In 
Chatteried (120071 ) this construction is used to prove concentration of measure 
results for such W. 

3.1.3. Finding an antisymmetric G through a Poisson equation. Assume 
now that (X, X') is an exchangeable pair on some space X and W = f(X) 
and W' = }p{X' ) for some functional tp : X — > K with E,<p(X) = 0. 
Chatter] e i (1200.1 Section 4.1) proposes a general approach to find G of 
a special form. Let G(X',X) = ^(t)j(X') — t/j(X)) for some unknown func- 
tional tp : X — > 1R ( i n fact , any anti-symmetric function can be written in 
this form; see Stein! ( 19861 )). Using exchangeability, it is not difficult to see 
that (|1.8|) is satisfied if 

ip{x) — Pvp{x) = f(x) 
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for every x G X, which we recognize as a Poisson equation with kernel 
Pip(x) := 1E x=x ip(X') for given p and unknown ip. A general (formal) 
solution is ip(x) = X^fclo P k p(x). We have the following. 

Construction IE. Let (X, X') be an exchangeable pair on a measure space 
X and let p : X — > R be a measurable function such that Ep{X) = 0; define 
P as above. If there is a constant C > such that 

oo 

J2\P k <P(x) - P k <P(y)\ (3.10) {29} 

k=0 

for every x,y £ X , then 

_. oo 

(w, w, G) = (ip{x)Mx% 2 - P h v{x'))) 

k=0 

is a Stein coupling. 

Note that boundedness |G| ?J C/2 is built in through (|3.10p so that this 
construction is a natural candidate for Theorem 12.51 We can give a more 
constructive version of this coupling. 

Construction IF. Assume that (X,X'), p and P are as in Construc- 
tion \1FX Assume that we have two Markov chains (X n ) n>Q and (X^) n>Q 
with the transition dynamics given by P, and also Xq = X and X' = X' . 
Assume further that, for all n, 

&{X n \X,X') =Sf(X n \X), Sf(X' n \X,X')=Sf(X' n \X'). (3.11) 

Let now T = infjn > | X n = X' n } be the coupling time of the two chains 
and assume that T < oo almost surely. If, given T, I is uniformly distributed 
on {0, 1,2, ... ,T- I}, then 

{W,W\G) = {p{X),p(X'),\T{p{X I )-p(X' I ))) (3.12) {30} 

is a Stein coupling. 

Indeed, from 

E{<p(X k )f(W)} = E{P k (X)f(W)} 

and 

E{cp(X' k )f(W)} = E{f(W)E x ' x 'p(X' k )} = E{f(W)P k p(X')}, 
we easily obtain 

T-l 

E{T(v9(X 7 ) - p(X'j))f(W)} = E £>(X fc ) - p(X' k ))f{W) 

k=0 

oo 

k=0 

oo 

= Y,^{{P k ^X)-P k p(X'))f{W)}, 

k=0 
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and, similarly, 

oo 

V{T{rtX I )-^X' I ))f(W')}=Y,V{( pk v(X)-P k v(X'))f(W')} 



k=0 

oo 



^ f E{[P k <p(X)-P k V ,(X'))f(W)}, 



k=0 



where the second step uses exchangeabilit y. Hence, it f ollows from Construc- 
tion llEl that (|3,12p is a St ein coupling; seelChatteried (120051 . Section 4.1) for 



more details, and see also iMakowski and Shwartd (|1994l ) on general theory 
about Poisson equations. 

3.2. Local dependence and related couplings. This is one of the earli- 
est versions of Stein's method. Let in what follows I be uniformly distributed 
on [n], independent of all else. 

Construction 2A. Let W = Y!i=i x i with = °- For each h let W[ be 
such that 

E(Xi\Wl)=Q. (3.13) {31} 

Then, (W,W',G) = (W,Wj,-nXj) is a Stein coupling. 
To see this we have on one hand 

n 

-M{Gf(W)} =Y,^{Xif(W)} =TS{Wf(W)}, (3.14) {32} 

i=l 

and on the other hand 



E{Gf(W')} = -^{^/(W/)} = 0, 



i=l 



due to (|3.13p : hence (|1.8p is satisfied. 

The choice G = —nXj was first considered by ISteinl (|l972h for m-dependent 



sequences, however this G has broader applications. We now discuss some 
more detailed constructions of W' below. 



3.2.1. Local dependence. Local dependence was extensively studied in lChen and Shao 



(120041 1 under v ariou s dep endence setti ngs, but of course this approach goes 



back to ISteinl (ll972l N | andlChenl (|l975l ): a version for discrete random vari- 



ables is given bv lRollinl (|2008bl K We can use the simplest form as a starting 
point 



Construction 2B. Assume that W and G are as in Construction \2A\ As- 
sume in addition that, for each i £ [n], there is A{ C [n] such that Xi 
and (Xj)j^A^ ar e independent. Then, with W[ = W — J^jeA X j> P-l^P *' s 
satisfied. 

This first-order dependence is usually referred to as (LD1) and is enough 
to obtain a Stein coupling. However, it is possible to extend this coupling. 

Construction 2C. Assume that W and G and Wi are as in Construc- 
tions \2~M and \2B\ and that VarW = 1. Assume in addition that there is 
Bi C [n] such that Ai C Bi and (Xj)j e Ai an d ( x j)jeB^ o^re independent. 
Define W" = W — Y^jeBt X i> ^en W" := W'{ is independent of GD and 
hence r% = 0. 
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The conditions of Constructions I2BI and I2CI together are referred to as 
(LD2). 

3.2.2. Decompo sable random variables . This version of local dependenc e was 
popul arized by Barbour et al. (1989) fo r smoo t h test functions, by Raic 



(2004) for Kolmogorov distance and by iRollinl (1200801 ) for total variation 



approximation of discrete random variables. It uses a refined version of the 
concept of second-order neighborhood and makes use of non-trivial D and S. 
Construction 2D. Assume that W and G and W% are as in Construc- 
tions WM and \2Bl Assume in addition that, for each i and for each j E A+, 
there is Bij C [n] such that A% C B^j and such that (Xi,Xj) is inde- 
pendent of (Xj)j E B9.- Let Ki = \Ai\ and define D = KjXj where, given 
I, J is uniformly distributed on [Kj], but independent of all else. Then, 
M X (GD) = 1E X (GD), hence r x = 0. Let S = nKjo-jj where a itj = 
EiXiXj), then r 2 = 0. Define W(' ;j = W - Y.k&B, . X k ; then W" := Wfc is 

independent of GD and S and hence r% = 0. 

Hence, if we can choose Bij such that Bij C Bi, where Bi is as for the 
standard (LD2) local dependence setting from Construction I2C( we should 
be able to improve our bounds, as W" — W = X^fcgB/ j X k contains fewer 
summands as compared to X^fceB/ f rom C onstruction I2CI 

Note that, under third moment conditions, Barbour et al. ( 19891 ) obtain 
a Wasserstein bound of order 

n n 

^E|X i Z2| + ^^(E|X i X,y M -| + |E(X i X j )|E|Z l + ^ i |) (3.15) {33} 
i=l i=l j&Ai 

with 

Z% '■= ^2 Xk, Vi t j := ^2 

3&A j keB id \Ai 

(note that in Barbour et al.l (|l989l ) the coarser expression E|XjX,| is used, 
but it is easy to see that this can be sharpened to |E(XjXj)|. Using Corol- 
lary [231 we obtain an order of 

n n 

^nxiZfi+^^^XiXjiZi+v^i+mxiX^inzi+Viji). (3.16) {34} 

i=l i=l j&Ai 

In most cases we can expect that these two bounds will yield similar results: 
Indeed, a useful upper bound on both estimates is 

n 

zZY. {nXiXiXk\ + MXiXj)M x h\) 

t=l jeAi k£Bij 
up to constants. 

Consider the von Mises statistics as an example, where, for independent 
random variables X±, . . . ,X n , we have W = J2 P q Pp,q(X p , X q ) for some 
functional <Pp,q- Clearly, we can choose -A(p )9 ) = {(k,l) : k = q or I = q} as 
first-order neighborhood. However, in the standard local approach frame- 
work, we would need to let B( p q j = [n] x [n] for (LD2) so that W" = and 
hence \D'\ = \ W\ which would not yield useful bounds. In the refined set- 
ting we can choose, for every (p',q') € ^4( p , g ), the set B(p,q),(p',q') '■= {(k,l) : 
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k G or I G {(7,(7'}}, so that now |-S(p,g),(p',g')| is om y of order n. Of 

course, it will depend on the concrete choice of functionals ip p .„ whether 



normal approximation is appropriate at all; see Barbour et al. ( 1989 ) for 
applications to random graph related statistics. 

3.2.3. Special case: quadratic forms. Let £1, . . . , £ n be independent, centered 
random variables with unit variance. Let A = (ay) be a real symmetric 
(n x n)-matrix. Let W = (^ ■ (Hj£i£j — Yli a ii) ■ ^ would be straightforward 
to use the above method of decomposable random variables in this situation. 
However, due to the multiplicative structure (or, in [/-statistics language, 
because the kernel (pij(x,y) = a^xy is degenerate for centered random 
variables) there is an interesting alternative. 

Construction 2E. Let W be as above. Let also Y{ := Ylj a ij£j> @ = 
—n{^iYi — an) and W = W — (2£/Y/ — ajj^j). Then this defines a Stein 
coupling. 

It is not difficult to see that (|1.8p holds. Again, it depends on the matrix 
A whether we can ex pect norma l like behaviour of W or not: essentially this 
coupling was used by Luk ( 19941 ) in the context of ^-approximation for the 



case where all the entries of A are 1, corresponding to the square of a sum 
of random variables. 



3.2.4. Local exchangeable randomization. This coupling was proposed in lReinert 



(1998). Its use was limited by the fact that, if the classic exchangeable pairs 



approach (as discussed in Subsection I3.ip is used along with this coupling, 
the linearity condition (j3.2|) will in general not be satisfied with R small 
enough, but with the choice G = —nXj, we can now handle this coupling. 
However, some care is needed. 

Construction 2F. Let W and G be as in Construction ^ A[ Let (X- j)ije[n] 
be a collection of random variables, such that, with W[ = Y^j=i -^i j > we 
have that 

(i) for each i, X' ii is independent ofW, (3-17) {37} 

(ii) for each i, ((A^)^., (X[ k )k) is exchangeable. (3.18) {38} 

Then (|3.13p is satisfied. 

It is often not too difficult to construct {X^^j for a given i such that 
(Wl) = JSf (W) and such that X' iti is independent of W and hence E^I^ = 
0. However, it is important to note that this does not suffice as we ultimately 
need E »Jj = 0, which is, however, guaranteed under the additional Con- 
ditio n (13TT5D . 



In Reinertl (1998), it was incorrectly deduced from (|3.17p and the property 



Sf(X' iij ,j^i\X , i)i = x)=Sf{X j ,j^i\X i = x) (3.19) {39} 

that (W, W-) is exchangeable. It is not difficult to find examples for which 
(W, W() is not exchangeable, but (|3.17p and (|3.19p are still true; see Re- 
mark [OJ 
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3.3. S ize-biasing. This appr oach was introduced in 



1989) and further explored in iGoldstein and RinottJ (19961 ). .1 Dembo and Rinottl 



Baldi. Rinott. and Stein 



199aj. lGoldst ein and Penrose! (|to appearl ) and others. 



Construction 3A. Let V be a non-negative random variable with MV = 
/x > 0. Let V s have the size-biased distribution of V , that is, for all 
bounded f, 

^{Vf{V)}= pMf{V s ). (3.20) {40} 

Then 

(W,W',G) = (V- f i,V s - f ,,[i) 

is a Stein coupling. 

Using (|3.20p . we obtain 

m{Gf{W) - Gf(w)} = E{/x/(y s — (j) — »f(v - /i)} 

= M{Vf(V-/i)-fif(V-/j)} 
= E{Wf(W)}, 

so that (|1.8p is satisfied, indeed. 

One of the advantages of this approach is apparent if bounds for the Kol- 
mogorov metric are to be obtained. In the light of Theorem l2.lt we see that 
Gjo is already bounded by a = fJ,/o~, so that we only need to concentrate on 
finding a bounded coupling (W, W); see Goldstein and Penrose! ( to appear ) 



for such a coupling in the context of coverage problems. 

3.4. Interpolation to independence. For this coupling the key idea is to 
construct a sequence of random variables that 'interpolates' between W and 
an independent copy of W by mea ns of small pertur bations. A special case 
of this coupling was introduced bv lChatteried ( 20081 ). The construction has 



apparent similarities to Lindeberg's telescoping sum in his prove of the CLT 
for sums of independent random variables. Let in the following construction 
I be uniformly distributed on [n] and independent of all else. 

Construction 4A. Assume ~E,W = 0. Assume that for each i G [n] we have 
a W[ which is close to W. Assume that there is a sequence of random vari- 
ables Vq, V\, . . . , V n such that E w Vq = W and such that V n is independent 
of Vq and assume that, for every i £ [n], 

J?[(W,V i ^ 1 ),(WiV [ )]=J?[(W;,V i ),(W,V^ 1 )] (3.21) {42} 

for every i G [n] . Then 

(W, W, G) = (W, W'i, f(Vr - Vi-i)) (3.22) {43} 

is a Stein coupling. 

Note that ()3.2ip implies in particular that (W, W-) is an exchangeable 
pair for each i and also that Jz?(Vi) = Jz?(Vo) for all i by induction. We have 



i=l 

= ^{(V n -V )f(W)} 
= -^{Wf(W)}, 
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due to the independence assumption, and, due to (|3.2ip . 

E{Gf(W')} = ±J2E{(V i -V i - 1 )f(Wl)} 

i=l 



n 
i=l 



2 

i=l 



= -E{G/(W)} = \¥,{Wf(W)}. 
Hence, (|3.22f) is a Stein coupling, indeed. 

3.4.1. Functionals of inde pendent ra n dom variables. A specific version of 
this coupling was used bv lChatteried ( 20081 ) for functionals of independent 



random variables. We give a simpler versio n first and discuss then the 
(implicitly used) coupling of IChatterjee ( 20081 ). 



Construction 4B. Let X = (X± , . . . , X n ) be a collection of independent 
random variables and let W = F{X) be any functional of X such that 
E-F(Jf) = 0. Let X' = (X[, . . . , X' n ) be an independent copy of X and 
define for all subsets A C [n] the vectors X A = (X A , . . . , X A ) by 

X A = [ X 'i ? <€ 4 

[Xi //;•• ,i. 

that is, X A is simply X but with all Xi replaced by X[ for which i S A; 
define also W' A = F(X A ). Let W[ := W' {i} for each i e [n] and Vi := 
for each i G [n]U {0}. Then the conditions of Construction \4A] are satisfied. 



Clearly, V n is independent of Vo and it is not difficult to see that (|3.2ip 
holds. The interpolating sequence is therefore constructed simply by re- 
placing the Xi by X- in increasing order. For this coupling to be useful we 
would typically need that F is not too sensitive to changes in the individual 
coordinates. 

The implicit coupling used by Chatterieel (2008) is different in the sense 



that, instead of using a fixed order in which the Xi are replaced, a random 
order is used. 

Construction 4C. Assume that W, F , X and X' are as in \4B\ Let U be a 

uniformly drawn random permutation of length n, independent of everything 
else. For any permutation tt we denote by ir(A) simply the image of A 
with respect to tt. Define now W[ := W/ n /^i and Vi := W^any Then the 



conditions of Construction \4A\ are satisfied. 

Exchangeability (|3.2ip follows from Construction l4Bl bv conditioning on II. 
Let us now prove that Co nstruction I4CI indeed leads to the representation 
used bv lChatteriel (|2008h . Clearly, G = ^(W n([/]) - WL^^X thus 



i=l it 



STEIN COUPLINGS FOR NORMAL APPROXIMATION 



27 



We re-write the sum over all permutation as a sum over all possible subsets 
induced by 7r([£— 1]), that is, all possible subsets A C [n] with \A\ = and 
over all possible values of ir(i) which range over [n] \A. Taking into account 
multiplicities from all possible permutations within the sets ir([i — 1]) and 
vr([n] \ [i]) we obtain 

1E X > X '(GD) 

= 2^E E Y,\Mn-\A\-l)\{W> Au{j} -W> A ){W-W{ j} ) 

1=1 AC[n], jdA 

\A\=i-l 

^MM^ iw ' A ^- w ' A)(w - w ' m) ' 

which is exactly the expression used bv lChatteri 

3 (|2008l . Eq. (1)). 



3.5. L ocal symmetry. An instance of this coupling was used by IChen 
( 19981 ) for sums of independent random variables. 

Construction 5A. Assume that W , W , G a and G@ are random variables 
such that 

®[G a f(W')} = -E{Gf,f(W% (3.23) {45} 

E{G a /(W)} = E{Wf{W)}, (3.24) {46} 

®{Gpf(W)} = 0, (3.25) {47} 

for all f for which the expectations exist. Then, (W, W, Gp — G a ) is a Stein 
coupling. 

Indeed, using (|3.23|) for the first equality and then (|3,24[) and (|3.25p . 

E{(G /3 - G a )(f(W) - f(W))} = E{(G a - Gp)f(W)} = B{Wf(W)}. 

Note that we refer to Condition (|3.23[) as local symmetry due to the following 
example. 

Let X = (Xi,...,X n ) be a sequence of centered independent random 
variables and let X' be an independent copy of X. Let W = Y2i %i an d 
assume that V&rW = 1. Define G a = Xi and W = W + Xj (we 'duplicate' 
a small part of W). Define also Gr = Xj. Then it is not difficult to verify 
Conditions (|3.23|) and (|3.24|) . Identity (|3.23|) is due to the symmetry of G a 
and Gp relative to W , which is the crucial aspect of the construction. 

3.6. Abstract approaches. One might wonder if, for a given arbitrary 
coupling (WiW 7 '), one can always find G to make (W,W',G) a Stein cou- 
pling. 

Construction 6A. Let (W, W) be a pair of integrable random variables. 
Let J- and J- 1 be two a-algebras with o~(W) C J- and o~(W') C T' . Let V be 
a random variable such that 

1E W V = W. (3.26) {47b} 

Define (formally) the random variable 

G = -V + JE{V\T') - E(E(F|J r, )|-7 7 ) + E(E(E(F| F')\F)\F') - .... 
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Construction 


w = w 


(W',W) 
exchangeable 


W"^W 


Hoeffding (Var. 1) 


H£l(p.H7) 


yes 


yes 




Hoeffding (Var. 2) 


l2Flfp.l2I) 


yes 


yes 




Hoeffding (Var. 3) 


15X1 (p. 127b 


no 


no 


yes 


Occupancy 




yes 


yes 


yes 


Neighbourhood 


[2X1 (p. [221 


no 


no 


no 


Random graphs 


[2X1 (p. [221 


yes 


no 


yes 



Table 1. Overview over the couplings used in the different 
applications along with some interesting properties of the 
involved random variables. 



If the above sequence converges absolutely almost surely, then (W,W',G) is 
a Stein coupling. 

To see this, first condition G on T' which yields E(G|J r/ ) = 0. On the 
other hand, E(G|J r ) = -E(F|J C ") hence E, W G = -W and flES]) follows. 
Note that (|3.26|) is not to be confused with the usual linearity condition 
(|3.ip — we can always take V = W to satisfy (|3.26p . 

Consider the example W = Y^=i-^h a sum °f independent, centered 
random variables. With / independent and uniformly distributed on [n], let 
W = W -Xj. Take V = W and note that 

E(W|W') = E(W + X^W') = W, ¥,{W'\W) = (1 - i)W. 

Hence, 

-G = W - W + (1 - l)W - (1 - l)W + (1 - \) 2 W - (1 - \fW + . . . 
= X I + (1- + (1 - IfXi + ... = nXj. 

Alternatively, chosing V = nXj yields the same result directly, as E w V = 
and hence G = —V = —nXj. 

4. Applications 

In this section we give some applications of the main theorems and corol- 
laries using different couplings from Section [3l Table [1] gives an overview 
over the different couplings we use in this section along with some important 
characteristics of the involved random variables. 

4.1. Hoeffding's combinatorial statistic. Let aij, 1 ^ i, j ^ n, be real 
numbers such that Yjt=i a i,k = Y^k=i a k,j = an d ^rj a i,j = 1- Let 7r 
be a uniformly chosen random permutation of size n and W = Ya=1 a i,ir(i)- 
Then it is routine to see that = and Var W = 1. Note that, for a Stein 
coupling (W, W',G), unit variance of W implies E(G-D) = 1. Let in what 
follows I\ and I2 be independent and uniformly chosen random numbers 
from [n]. 
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Variant 1 (c.f. Construction I1A|) . Define it' = ir o (h I2) and W' = 

Ya=i a i,ir'(i)- Then (W, W) is a classical exchangeable pair, i.e. ([3.1 p holds 
with A = 2/n, or, equivalently, G = j{W — W) = j(ai u7r (i 2 ) + a / 2 ,7r(/i) ~~ 
ffl 7i,7r(Ji) ~~ a /2,vr(/ 2 )) ma kes (W,W',G) a Stein coupling. 

Variant 2 (c.f. Construction I2F|) . Define I4 7 ' as in Variant 1. With 
G = —na^ ^fjj), (W,W',G) is also a Stein coupling. Thi s cou pling is the 
(impli cit) basis for the construction in Ho and Chen ( 19781 ) and iBolthausen 



(1984). 



In both of the previous variants, our W' is defined with respect to a 
perturbation ir 1 of ir. Thus, W can be seen as an instance of the replacement 
perturbation from the introduction. This comes at the cost of D having 
four terms. One may wonder whether a deletion construction is possible, 
where W is defined by just 'removing a random small part' of W (see 
Section [3.2. ip . This is possible, indeed, so that we do not need to go through 
constructing ir'. Despite the fact that the following construction is very 
simple, it has gone unnoticed in the literature so far. 

Variant 3 (c.f. Construction |5A"|) . Define 

W' = W - l^ ail ' n{h) + ai ^<^ if 7l ^ /2 ' 
l a /i,7r(/i) if h = h- 

Let G = n(aj lj7r n 2 ) — a Il7T ^); then (W,W',G) is a Stein coupling. To 
see this, note first that a(W) C T := crf/i, I2, (7r(i); i 7^ ii,^))- Now, if 
A 7^ h, the conditional distributions Jzf (^(/i)^) and ££ (k(J2)\T) are equal 
and assign probability 1/2 to each of the points in the set {7r(Ji),7r(/2)} 
so that E{G/(W)} = 0. The same arguments from Variant 1 lead to 
E{Gf(W)} = -E{Wf(W)}. 

To see the connection with Construction I5AI let G a = naj^n^ and 
G/3 = naj lj7r n 2 y, then ()3.23p - (j3.25p are satisfied. 

Let us quickly illustrate how to obtain a bound in terms of 

||o|| := sup |ajj|. 

With the Stein coupling from Variant 3 we have 

|G| < 2n\\a\\ =: a, \D\ < 2||a|| =: f3. 

We will make use of an auxiliary variable W", which can be constructed so 
that it is independent of (2i, I2, 7r(ii), ^(-^2)) and such that 

\D'\ ^ 8\\a\\ =: ft. (4.1) 

Hence, applying Theorem l2.5l with the above random variables and constants 
and in addition D = D and f3 = f3 we easily obtain the following result. 

Theorem 4.1. With W and \\a\\ as above, 

d K (j^(WO,N(0,l)) < 448n||a|| 3 +96||a||. 

Proof. We o nly need the existe nce of W" as claimed above; we use the con- 
struction of Bolthausen ( 19841 ). It turns out that it is more convenient to 



construct W from W". Let r be a uniformly chosen random permutation of 
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[n] and let W" = ^ Oi, T (i)- Let {I\,l2, Ji, J-i) be random variables indepen- 
dent of r such that (I\,l2, J\) is uniform on [n] 3 , such that J2 is uniform on 
[n] \ {Ji} if Ji 7^ J2 and such that Ji = J2 if Ii = ^2- One can now construct 
a permutation tt which again has uniform distribution, and such that 



7T(/l) = Jl, 7T(J 2 ) 

and such that r and 7r differ in at most four positions. Now IE 7 " (Gs(W^ — 
W)) = E,(Gs(W-^ — W)) (and hence = 0) follows from the indepen- 
dence assumption between r and (Ji, I2, Ji, J 2 )- Note that the calculations 
from Variant 3 above still hold, as, given tt, (Ji, J 2 ) is uniformly distributed 



on n 



hence independent of 7r as required. 



□ 



Remark 4.1. Note that IGoldsteinl (|2005l h using zero-biasing, obtains 

d K (^f(PF),N(0,l)) sC 1016||o|| + 768||a|| 2 . 

4.2. Functionals in the classic occupancy scheme. Let m balls be dis- 
tributed independently of each other into n boxes such that the probability of 
landing in box i is Pi, where X^LjJ^i = \ - The literature on this topic is rich; 



see for ex ample I Johnson and Kotz ( 1977 l.lKqlchin. SevastVanov. and Chistvakov 



( 19781 ) or Barbour. Hoist, and Janson (11992T). but als o more recent results 
such as iHwang and .Tansonl (|2008h and lBarbourl l|2009h on local limits theo- 
rems for infinite number of urns. If denotes the number of balls in urn i 
after distributing the balls, some interesting statistics can be written in the 
form 



U 



for functions h : Z + — > R. Examples are 

h(x) = I[x = k] urns with exactly k balls", 

h{x) = I[x > mo] urns exceeding a limit mo", 

h(x) = l[x > mo] (a; — mo) "# excess balls when urn limit is mo"; 

see for example iBoutsikas and Koutrad (120021 1. 
Let us consider here the more general case 



u = j2h i (a i ) 



(4.2) {52} 



i=i 



for functions hi : Z + — >• R, i = 1, . . . , n. Due to the subsequent centering, 
we may assume without loss of generality that hi(Q) = for all i. 

Theorem 4.2. Assume the situation as described above. Let W = (U — 
[1)1 o, where \i and a 2 are the mean and the variance of U. Define the 
quantities 



sup \\hi\\, 
l<i<n 



and assume that 



\Ah\\ = sup sup \\hi(j + 1) - hi(j)\\, p= sup pi, 



^ 1 and ||A/t|| 1. Then, if 



1 + Wmp < 41n(n||/t||) ^ 



(2p)V2' 



(4.3) {53} 
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we have 

dK(jSf(W),N(0,l)) 



409600n||A/i|| 3 ln(n||/i||) e (l + <r 2 /n) 3888 ln(n 



< » \" "> — + 



d 3 a 2 



The constants are large, but explicit. They can be reduced if stronger 
conditions than (|4.3f) are imposed, such as minimal values for n. In typical 
examples, where \\h\\ x ||A/t|| X a 2 /n x 1 we obtain a rate of convergence 
of 0(ln(n) 6 n -1 / 2 ). The correct rate of convergence for the number of empty 
urns of equip robable u r ns is Q(n~ 1 / 2 ), which is best possible and was ob- 
tained first by Englund (1981). Although our bound does not yield this rate, 
it is far more general. As a consequence of our result we have the following. 



Corollary 4.3. Sufficient conditions that U as in (|4.2[) is asymptotically 
normal as n — > oo are 

(i) \\Ah\\ remains bounded, (ii) mp = 0(ln(n)). 

(Hi) ln(n) 4 n 2 / 3 / Var U ->■ 0, (iv) p = 0(ln(n)- 2 ), 



Proof of Theorem Let us first state a simple, but key observation, which 
will be used in the proof: 

Fact A. Assume the situation as stated in the theorem and let K C [n] be 
arbitrary. Then, the joint distribution of the balls in the urns of K and 
K c is only connected through the total number of balls in these respective 
subsets. That is, given the number of balls in each urn of K and assuming 
there are a total of N balls in the urns of K, then the balls in K c are 
distributed as if m — N balls were distributed among the urns K c according 
to the distribution {pi/J^keK- Pk) i€K c- 

We will use Construction l2Fl to construct our Stein coupling. First, dis- 
tribute m balls into n white urns according to (j>i)i£[ n ] an d denote by £j the 
number of balls in white urn j. Fix i and note that 

J2f(&)=Bi(m,pi). (4.4) 

Let us now construct a family of random variables (£'ij)j£[n] sucri that (|3,17p 
and (|3.18p are satisfied. 

Assume we have an additional set of n black urns. Independent of all else, 
let ^ j have distribution (|4.4p and put that many balls into black urn i. We 
proceed with the remaining ^ •, j ^ i. First, for each j ^ i, put £j balls into 
black urn j. Let N\ = |£j — ^ J be the difference of balls in the white and 
black urn i. If £j > ^ i5 distribute an additional N\ balls into the remaining 
black urns according to the distribution (pfc/(l — Pij)^,^ If £i < ^ j, remove 
instead N\ balls from the remaining black urns, where the balls are chosen 
uniformly among all the balls in the black urns except those in black urn i. 
It is not difficult to see that the construction is in fact symmetric, that is, 

-^((£i)ie[n]> (€ij)je[n]) = ^{(C'i,j)j&[n], (6i)i€[n.])- 

Now, define U[ = £\ and W[ = (U { - n)/a. It is clear that (13TTD 

and (|3,18p hold. With / uniformly distributed on [n] and independent of all 
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and with /Xj := E/tj(£j), we hence have that 

(W, W, G) := (W, W'j, -n{hi{£i) - m)/a) 

is a Stein coupling. 

Fix again i. We now construct another family of random variables (d',j)je[n]- 
Denote by Ki = {j ^ i : ■ 7^ the set of indices of those urns for which 
the number of balls in the corresponding black urn and white urn differ, not 
including urn i, and K{ = KiL){i}. Assume that we have a set of n red urns. 
First, independently of all else, distribute N% ~ Bi(m, YlkeK- balls into 
the red urns from Ki according to the distribution (pj/YlkeKiPk) eK ' 
note by N2 = Ylj^Ri the number of balls in the white urns from Ki, and 
by N3 = \N2 — N%\ their difference. Now, in very much the same way as we 
constructed the balls in the black urns from those in the white urns, put £j 
balls into red urn j for each j £ Kf. If N2 > distribute another N3 balls 
into the red urns from Kf according to the distribution [pj /YlkeK c Pk) j£K CJ 
and if N2 < N' 2 ' remove N3 balls uniformly among all the balls in the red urns 
from Ki. Now, given ^j)^^., by construction, the (£ji)je[ n ] always will 
have the distribution of (£.j)je[n]- Hence, U" := Y^j=x ^ji^'/j) * s independent 
of XJ[ - U = >;,,,,•(//,<;) - kite-)) and fc, that is, W" = (U>{ - y)/a is 
independent of GD. 

We are now in a position to apply Theorem 12.51 with D = D and S = 1. 
We clearly have ro = r\ = r2 = r% = 0. Note that 

D = \ E M&j) - D ' = ^J2 M&j) - 

jeKj j€K'j 

where K[ = Ki U K[ with K[ = {k € Kf : ^ £ fe }. 

Let C > 0. We have \/ii\ ^ mjj||A/i|| for every i. Then, with a := 
nC\\Ah\\/a + nmp\\Ah\\/a, 

P[\G\ > a] < P[n|fc/(&)|/<7 > a - npn/a] ^ P[M&)I > C\\Ah\\] 
< P[fr >C]^ P[Bi(m,p) > C]. 
Furthermore, with /3 := 2C(C + l)||A/i||/er, 

F[|D| > 0\ < P& V £/ > C] + F[|D| > 0, £ V < C] 

< p[e; v 6 > c] + p& v & < c, 3? g ^ s.t. e; v & > c\ 

+ P[\D\> ^^VZi^CVieKj], 
and, noting that the last expression is zero, this is 

< P[£j > C] + P[fc > C] + 2CP[Bi(m,p/(l -p)) > C] 
sC 2P[Bi(m,p) > C] + 2CP[Bi(m,2p) > C\. 

Lastly, with (3' := 2C(C + 1) 2 ||A/i||/ct, 
P[|£>'| > /3'] 

^ p[ti v ei > c] + pod'i > /?', & v & < c] 

<p&v&>c] + p&v^c, £ ejv E ^->c(c + i)] 
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+p[|iyi v£/<c, £ ejv e 6<c(c+i)] 
+f[#v&<c, E(;v e Ci<c*(c+i),3ie^ s .t. efvCi>c] 

+ p [|d'| > /?', £ v & < c, ef V & ^ CWi e K'j] , 

and, noting that the last expression is zero, this is 

< 2P[Bi(m,p) > C] + P[Bi(m, (C + l)p/(l - p)) > C(C + 1)] 

+ 2C(C + l)P[Bi(m,p/(l-C(C+l)p) >C] 
^ 2P[Bi(m,p) > C] + P[Bi(m, 2(C + l)p > C{C + 1)] 

+ 2C(C + l)P[Bi(m, 2p) > C] 

if 

C(C+l)p<l/2. (4.5) {55} 

Now, from the Chernoff bound we have that, for every e ^ 0, 

.2 



P[Bi(ra,p) > (1 + e)np] < exp( 
which implies that 



V 2 + e 



np I , 



P[Bi(n,p) > x] < e _a;/2 (4.6) {56} 

as long as x > 5rep. Choose C = 41n(n||/i||) — 1; this implies (|4.5|) under 
Condition (|4.3|) . Also under Condition (|4.3|) . (|4.6|) can be used to obtain 

P[|G| > a] < P[Bi(m,p) > C] < — 



n- 

P[\D\ > 0\ < 2F[Bi(m,p) > C] + 2CP[Bi(m,2p) > C] 
4+ 161n(n|| 



n 2||f,||2 



P[\D'\ > 13'] ^2P[Bi(m,p) > C} + F[Bi{m,2{C + l)p> C(C + 1)] 
+ 2C{C + l)P[Bi(m, 2p) > C] 
6 + 64m(n|^"^ 



n 2\\h\\2 



Now, Theorem 12.5} with a, /3 := f3 and f3' as above, and the rough bounds 

cr it 
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yields 

d K (jSf(W),N(0,l)) 
< 12(a/3 + l)p' + P~ a2 



12n 2 \\h\\ 2 

+ " 11 (P[\G\ > a] + P[|D| > /3] + P[|D'| > /?'] 



a 2 



48n(C + l) 5 ||A/i|| 3 (C + mp + a 2 /n) 32n(C + 1) 4 || Ah\\ 3 {C + mg) 
a -3 c 3 
12(12 + 161n(n||/i||) + 641n(n||/i||) 2 ) 



+ 



80n(C + l) 5 ||A/i|| 3 (C + mp + a 2 /n) 144(1 + 11 ln(n||/i||) 2 ) 



< i ^ : + 

which, after plugging in C and with some further straightforward manipu- 
lations and estimates, yields the final bound. □ 

4.3. Neighbourhood statistics of a fixed number of uniformly dis- 
tributed points. Consider the space J = [0, n 1 ^) 8 for some integer d ^ 1 
and let X±, . . . ,X n be n i.i.d. points uniformly distributed on J . Let tp be 
a measurable real-valued functional defined on all pairs (x, X) where x G 
and A* C J is a finite subset. Such statistics have been investigated in 
many places for specific choices of ip. If the number of points is Poisson 
distributed, approaches using local dependence can be successfully applied. 
But if the number of points is fixed, besides the local dependence also global 
weak dependence has to be taken into account. 

Assume that ip is translation invariant, i.e. ip{x + y, X + y) = tp(x,X), 
where we assume the torus convention for translations. Assume that ip has 
influence radius p, i.e. for each x S J and each X C J we have 

ip(x, X) = ip(x, X n B p (x)), 

where B p {x) is the closed sphere of radius p and center x under the toroidal 
Euclidean metric. To avoid self-overlaps, assume p < ^n 1 ^. Let X := 
{Xi, . . . , X n }, define U = Ylxex ^i x i X), cr 2 = Var U and W = Ufa, where 
we assume that E^(Xi, X) = 0. 

Theorem 4.4. Assume W is defined as above and assume also that p 
7r _1 / 2 r(l + d/2) l / d . Then there is a universal constant only depending 
on d such that 

dw( ^),N(0,l)H«^. 

In the case where d, p and H^H remain fixed as n — > oo we have from 
Penrose and Yukichl ( 200ll ) that a 2 x n as long as Vax(ip(Xx 1 X)) > 0; in 



that case, Theorem 14.41 gives the best possible order n _1//2 for the Wasser- 
stein metric. 

After constructing an extended Stein coupling, the proof of the theorem 
essentially amounts to bounding the third moments of some mixed binomial 
distributions. Constants could be easily extracted from the proof but with 
the rough bound given here they are too large to be of practical use. 
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Proof of Theorem \4-4\ We first make a simple, but key observation, which 
will be used in the proof: 

Fact B. Let y be a fixed number of points distributed uniformly and inde- 
pendently of each other on an open subset J 1 C J . For any other open 
subset U C S, the joint distribution of points of y on U and U c is only 
connected through the number of points in these two respective subsets via 
the equation | C7 Pi + | t/ c Pi = |3^|- That is, given the total number 
and the number of points in one of the subsets, U, say, possibly along with 
their locations on U Pi J7 7 , the remaining points in the subset U c are then 
distributed uniformly and independently of each other on U c Pi J' . 

We first construct a Stein coupling according to Construction I2AI Note 
that W[ constructed below will not have the same distribution as W . Let 
to this end I = i and Xj = X{ be given. Let 

N iA = {XH B p { Xi j) \ { Xi } 

be the points within radius p of Xj, excluding X{. Let Ni 2 be an (iV^il 
number of points uniformly distributed on B c p {xi). Define the new set 

X! = [Xn B c p { Xi j) U N l>2 ; 

that is, remove Xi from X and replace the neighbouring points of X{ by the 
new points N^. Note that \X(\ = n — 1. Now, let 

xex> 

As ip(xi, X) = ip(xi,X P Bp(xi)) and using Fact B, we have that is inde- 
pendent of ip(xi,X). 

Randomising now over / and Xj, define U' = U[, W = Uj/a and G = 
—nip(Xi,X). Then, from Construction [2Al we have that (W, W',G) is a 
Stein coupling. Indeed, using the above mentioned independence, 

E{i>(Xi,X) | Ul} = 0, 

and hence (|3.13|) is satisfied. 

We now construct an extension W" of the basic Stein coupling, which will 
be independent of G and U' — U. Let again I = i and Xi = x^ be given. 
First, we define some further sets. Define 

M iA = B 2p (xi)U (j B p {x) 

and denote by = XC\ (M^i \ B p {xi)) all the points of X which remained 
fixed during the perturbation, but whose values of tp were potentially af- 
fected, either because of points being removed or points being added in 
their neighbourhood. We can now write 

A l :=Ul-U= ^ ^(x,X()+ J2 {^(x,Xl)-^x,X)) 

(4.7) 

- 1>(x,X)-il>(xi,X). 
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Let 



1/ 



i,2=B 3p (xi)U \J B 2p {x). 
xeN i>2 

Clearly, the values of A, and ip(xi,X) are determined by the points XC\Mi j2 
and X' n Mj 5 2 only. Given the points X n Mi f2 and X' n Mi 2, we have from 
Fact A that the remaining n— \Xf]Mi^\ points of X are distributed uniformly 
and independently of each other on Mf 2 . Hence we can use them for our 
construction of W" as follows. First, let iVj j4 be a Bi(n, Vol(Mj i 2)/n) number 
of points distributed independently and uniformly on Mi >2 - Then, let iVj 5 be 
the remaining n — |-/Vj i4 | points on Mf 2 distributed as follows. Let P(k) be 
k random points distributed uniformly on Mf 2 if > 0, and let, for k < 0, 
P(k) be \k\ randomly chosen points from X D Mf 2 (without replacement); 
set P(0) = 0. Let P := P(\X n M i>2 \ - \N iA \) and'define 



N, 



i,5 



(XnM? 2 )\P if |iv ii4 | > \xnM it2 \, 

(XnM° 2 )UP if |A^, 4 | ^ \xnM ii2 \; 

that is, we let JVj 5 be the points Xf}M? 2 and add or remove as many points 
(that is the points P) as needed so that | JV^ 4] + |JVi 5 1 = re. Now, setting 
Af/' = Ni^UNifi, we clearly have that [/" = X^e-*"-" ^(x^i) ^ s independent 
of (ip(xi, X), A) as the conditional distribution of X" given (A? n Mi )2 , A^' n 
Mj^) always equals to JC(X) by construction and (A? n Mi t 2,X- D Mj^) 
determines (i/j(xi, X), Aj) as mentioned before. Randomizing over I and 
X 7 , set £7" = and W = U"/a. Thus, (W.W'jG.W) satisfies the 
assumptions of Corollary 12.41 when setting D = D. It remains to find the 
corresponding quantities A and B. 
Define the sets 

M it3 = B 4p ( Xi ) U (J B 3p (aO, 



M, 



i,4 



.7; 



and let 



Y 2 
Y 3 
Y,i 



= Wi,i\ = \N it2 \ 

= \XnM i ^\-l-Y 1 , 

\XnM i ,\-l-Y 1 -Y 2 , 



Xn(B p {xi)\{xi})\ 
Xn(Mi, 2 \B p ( Xi ))\ 

xn(M^\M it2 )\ = 
N iA \ = \xl'nM ij2 \. 

Let K p := Vol(Bp(0)). Then the following statements are straightforward: 
(i) J?(Yi) = Bi(n-l,/c p /n), 
(u) J?(Y 2 + Y 3 \N i>2 ) = Bi(n - 1 - Y u 

{in) &(Y A \N i>2 ) = Bi(n, Vol(M ii2 )/re), 

(iv) Y A ±(Y 2 ,Y 3 ) given N i>2 , 

(v) \P\ ^\XH M i>2 \ + \X[' nM i>2 \ = l + Y 1 + Y 2 + Y 4 , 

(vi) \X(' n M i<3 \ ^Y 4 + Y 3 + \P\^1 + Y 1 + Y 2 + Y 3 + 2Y 4 , 



Vol(M,,3\Bp(x I )) 
(n— k p ) 
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(vii) | (X" \ P) n M i)4 1 ^ I X" n Mi, 3 1 + >5 and 

K* \ P) n m m | |*< n M i)3 | + y 5j 

for some F 5 with J^P, iV i)2 , N iA ) ~ Bi(n - y - 1, JvoSi) ) • 

The bound in (fi) follows from the fact that the difference between the 
points X n (Mi f 3 \ Mj )2 ) and A"' n (Mj i3 \ M i)2 ) is at most the points from P. 
The somewhat rough bound (vii) is due to the fact that the neighbourhoods 
of the points in P may overlap with Mi 3 . 

Let now C be a constant that may depend only on the dimension d but 
can change from formula to formula. From representation (|4.7p . we easily 
obtain 

|A| ^ MO^M + 2|iv 4 , 3 | + |AM + < GiiviKy + y + 

Now, we can write := U" — U as 

xeAr/'n(M ii4 \Afi,3) xexn{M iA \M i>3 ) 

hence 

< IMI(l*"nM ii3 | + |^nM ij3 | + \xl'nM iA \ + \xnM i>4 \)/a 

< C||^||Z/(7, 

where Z := 1 + Y x + y 2 + F 3 + Y 4 + y 5 . 

To estimate the third absolute moment of Z note first that if Y ~ Bi(m,p) 
with mp ^ 1 we have 

Ey 3 < 5m V • 

Note that the assumption p ^ vr" 1 / 2 r(l + ( i/2) 1 / d implies that Ey = k p ^ 1 
because k p = p d m = p d ir d / 2 /T(l + d/2). 

As the cubic function is convex on the non-negative half line, we have for 
any non- negative numbers a±, . . . , a m that 



(01 H + a m f ^ m\a\ H + a 6 m ). 

From (i) we immediately obtain 

Ey x 3 s$ 5k 3 . (4.8) 

From (ii)—(iv) we have that, given Y±, Y2 + Y3 + Y4 is stochastically domi- 
nated by Bi(2n, 2 Vol(Mj j3 )/n) which is further stochastically dominated by 
Bi(2ra, (2k4 P + 2Ac 3/9 y)/n), where we define Bi(m,p) := Bi(m, 1) if p > 1; 
hence, using this and (|4.8p . 



E(y 2 + y 3 + y) 3 ^ c^(k Ap + K3p y) 3 < c(k 3 p + (4.9) 

Note now that, given P, iVj 2 and iV»4, we have from (vii) that ^f(ys) is 
stochastically dominated by Bi(n — 1 — Y\, K p \P\/(n — K Ap — k^ p Yi) + ) which, 
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by (v), can be dominated by Bi(n — 1 — Y\, k p (1 + Y\ + Yz + Y±)/(n — K 4p — 
K3 p Yi ) + ); hence, using this, (|4.8p and (|4.9p . we obtain 

f ( k 3 J1 + Y 1 + Y 2 + Y 4 )' 



(n - K 4p - K3 p yi)-t 

< nP[y > cn] + Ck 3 e(i + y + y 2 + y 4 ) 3 

for n large enough and c such that n — K 4p — cnn^p ^ n/2, e.g. c = 1/(4k3 P ) 
and n ^ 4ft 4 p. From the Chernoff-Hoeffding inequality we obtain that 
P[y ^ cn] ^ 2~ cn for cn large enough. Hence, nP[y ^ nc] ^ C ^ Ck 3 
and 

pyf ^ Ck 3 (i + Ey x 3 + E(y + y 3 + y 4 ) 3 ) 

^(kJ + kJ + K?). 

Putting (|4.8 p - (|4.10p together we obtain 

ez 3 ^ c(i + Ey x 3 + E(y 2 + y 3 + y 4 ) 3 + Ey 5 3 

sC C(l + 4 + K 6 p + K 9 p ) ^ Ck% 

as k p ^ 1. Hence 



(4.10) 



1 n 

E|L>'| 3 = -J2 E| A'l 3 ^ CIIV'II 3 ^ 3 =: A z 
Furthermore, we have 



n 

i=l 



E|G| 3 < n 3 !^!! 3 /^ 3 =: B 3 . (4.11) 
Now, Corollary 12.41 yields 

d w (j2?(W),N(0,l)) < 5A 2 B ^ Cn\\i;fK 6 p /a 3 . 
By recalling that K p = p d K\ we obtain the final bound. □ 

4.4. Susceptibility and related statistics in the sub-critical Erdos- 
Renyi random graph. Let H be a graph (here, we use the letter 'If ' for 
graphs as the letter 'G" is used in the context of Stein couplings). Then, 
susceptibility x(H) is defined to be the expected size of the component 
containing a uniformly chosen random vertex. That is, if Cj C H, % = 
1, . . . , K are the K maximal subgraphs of the graph H, 

X(H) = E^\ C ^ ( 4 - 12 ) 

i=l 

where \Ci\ denotes the number of vertices in subgraph Cj. Using a different 
normalisation in f)4.12[) we can also write 

j2—— = -x(H), (4.13) 

4 = 1 

which, hence, is the probability that, for a given graph H, two randomly cho- 
sen vertices are in the same component, or, equivalently, connected through 
a path. Note that, as H is random, the probability of being connected is 
therefore itself a random quantity. 
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Using a martingale CLT it was proved by Janson and Luczak ( 20081 ) that, 



if H is a sub-critical Erdos-Renyi graph, x(-ff) is asymptotically normal. 
However, it is well known that standard martingale CLTs will often not yield 
optimal rates of co nvergence with respect to the Kolmogorov metric (see e.g. 
Bolthausen ( 19821 )). As often the actual dependence st ructure under con- 



sideration is much weaker than in a worst-case scenario, Rinott and RotaPI 



(1998) provide bounds that incorporate expressions to measure the depen- 



dency and which yield optimal bounds in many settings, but the involved 
quantities are typically hard to calculate. Let us derive here some bounds 
without using martingales and which are optimal up to some additional 
logarithmic factors. 

Let us consider here more general statistics that depend only on the prop- 
erties of the components of two randomly chosen vertices. For a vertex 
i G V(H) denote by C(H,i) C H the component (i.e. the maximal sub- 
graph) containing i. Let h = h(H,i,j) be a function that is determined by 
i and j and the two components that contain i and j, respectively, that is, 

h(H,i,j) = h(H n (C(H,i) U C(H,j)),i,j), for all H, i and j, (4.14) {64} 

and which is symmetric in the sense that 

h(H,i,j) = h(H,j,i), for all H, i and j. (4.15) {65} 

Then define 

Vifl):= E E h (H,i,j). (4.16) {66} 

ieV{H)j£V(H) 

We can recover susceptibility (|4.12|) from (|4.16p (up to a normalising con- 
stant) simply by choosing 

h(H,i,j) = I[i and j are in the same component]. (4-17) {67} 

As there is little hope for a normal limit for general h we consider functions 
that are non-zero only if the two random vertices are in the same component, 
that is, 

C(H,i)^C(H,j) h(H,i,j)=0. (4.18) {68} 

Hence, under (|4.18|) . we can write (|4.16p also as 

U ( H ) = E E h (H,i,j). (4.19) {69} 

iev(-ff) jeC(H,i) 

Dependence on just one vertex and its component is, of course, also covered 
as a special case h(H, i, i) = ho(H, i) and h(H, i,j) = if i 7^ j. For example 
ho(H,i) = l[\C(H,i)\ = 1] yields the total number of singletons, or, more 
generally, choose ho(H, i) = l[C(H, i) > mo] to obtain the number of vertices 
that are in a component of size larger than mo- We can also recover (|4,12D 
by setting ho(H,i) = \C(H,i)\ 2 /n, however this function is not Lipschitz in 
the size of the component. Other quantities of interest may be obtained, 
such as 

h(H,i,j) = I[i and j are connected, but not further than mo apart], 
h(H,i,j) = l[i and j are in the same cycle], 
h(H,i,j) = I[i and j are connected] / \C(H, 
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Note that by summing uniformly over all the vertices, the components are 
size-biased. However, we can 'de-bias' as follows: for a given ha(H,i) of 
interest, Kq{H,i) = ho(H,i)/\C(H,i)\ gives the corresponding unbiased re- 
sult, where now summing is done uniformly among the components. As 
\C(H,i)\ ^ 1, we have ||A/ig|| ^ [|A/to||. However, averaging is only possi- 
ble with respect to the expected number of components, that is, ft.g(iJ, i) = 

Ic^g^EA - ' no * * ne ac t ua l number, as the function t^n^T^ does not sat- 
isfy igU]). 

We have the following theorem. 

Theorem 4.5. Let U be as in (|4.16|) for some function h satisfying (|4.14|) . 
KT5h and (|4TTgjL Let W = (U(H) - n)/a, where /i = EU(H) and a 2 = 
Var U(H). Let 

\\h\\ = sup sup \h(H,i,j)\ ^ 1 

H i,j£[n] 

and, with H k > 1 denoting the graph where an additional edge is added between 
k and I if k 7^ I, 

\\Ah\\ = sup sup\h(H k ' l ,i, j) - h(H,i,j)\ ^ 1. 

i,j,k,le[n] H 

Assume that ^ h(H,i,j) ^ 2||A/i[| whenever H is such that the com- 
ponents which contain i and j, respectively, are singletons. Let H be an 
Erdos-Renyi random graph with n vertices and edge probabilities X/n < 1/n. 
Then, there is a universal constant K > such that 

<k(J?(W0,N(0,l)) 

^'nlog(n\\h\\) u \\Ah\\ 3 (l + 1/a 2 +a 2 /n) e a \n(n 



Xa u a 3 \a 2 a 2 
whenever 

a:= A-l-logA^41n(ra||/i||). (4.20) 

Let now A < 1 be fixed and let h be as in (|4.17p to obtain susceptibil- 
ity (up a normalisi n g con stant). Clearly, \\h\\ = 1 and ||A/i|| = 1. From 



Janson and Luczakl (|2008l ) we have that VaxU{H) ~ 2An(l — A) , hence 



Theorem 14.51 yields a Kolmogorov bound of order log(n) 11 /y / n. 

Proof of Theorem \4-5\ We will make use of Construction I2AI to obtain a 
Stein coupling. We consider throughout the vertex set [n]. Let Cy(H,i) 
denote the vertices of C(H,i). If e = {k,l} C V(H) = [n] is a potential 
edge in a graph H on [n] and K C [n], write e~Kiffc£KorlGK (or 
both). Write e </> K if k £ K and I $ K. 

Let now £ = where i,j £ [n] and i ^ j, be an i.i.d. family of 

Be(A/n) distributed random variables. Let H be the graph on the vertex 
set [n] with edge set £, that is, e is an edge in H iff £ e = 1. Let H* be an 
independent copy of H. Note that, given C(H, i), we have that (S,e) e ^c v (H,i) 
is a family of i.i.d. Be(A/n) random variables. Define the random graph HZ 
through 

, ^ e 6 E(H*) and e ~ Cv(H,i), or 
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As given C(H, i), is always an unconditional Erdos-Renyi random graph 
by construction, we have that C(H,i) is independent of the graph H[. 

Now, let I be uniformly and independently distributed on [n\. Set H' = 
H'j, U = U(H) and U' = U(H'). Furthermore, W = (U - n)/a, W = 
(U' - ii) I a and 

G = -- a { E HH,I,j)-[x\ 
a \eC(H,i) ' 

where ^ := MJ2jeC(Hi)^(^^^)- Then, it is easy to see that is 
satisfied and hence we have a Stein coupling. 
Let now 

V t := |J Cv(H'i,k) 

keC v (H,i) 

be all those vertices of the graph H whose components are affected by chang- 
ing from H to H[\ that is, the vertices Cv(H, i) themselves on one hand and, 
on the other hand, the vertices [n] \ Cy{H, i) that got connected in H[ to at 
least one of the vertices Cy {H, i) . 
Then we can write 

Ai := U(Hi) - U{H) = J2 E {h(H' i ,k,l)-h(H,k,l)). 

keViieo v (H,k) 

Note that the subgraphs H n Vi and H[ n Vi determine the value of Aj. 
Hence, similarly as before, let H** be an independent copy of H and define 
the graph H" by 

pfrrii^ e G E(H**) and e ~ V, or 

1 { ' ' e G E(H) and e 96 V t . 

With the same argument as before, H" is independent of HnVi and H^nV,. 
Define 

V(= |J C v (H>',k) 

to be all those vertices of the graph H whose components are affected by 
changing from H to H" . We can write 

^ := Uffl) - U{H) = J2 E W> k > l ) ~ h ( H > k > 0) • 

keV[ iec v (H,k) 

We are now in a position to apply Theorem [23J As (W, W' , G) is a Stein 
coupling we have tq = 0. Setting D = D and 5 = 1 we have n = r2 = 0, 
and by construction of H", that r% = 0. 

Let C > to be chosen later. From iDurrettl (|2007l . p. 38) we have that 

P(\C(H,l)\>k)^^. (4.21) 
Hence, we obtain the simple estimate 

2 



poo 

E|C(#,1)| 2 = / P[\C(H,1)\ > V^jdx^ 
Jo 



A a 



2 ' 



Note also that, if j G Cy(H,i), 

\h(H,i,j)\ ^ \\Ah\\\C(H,i)\. 
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This yields \m\ ^ || Ah\\E,\C(H, i)\ 2 ^ \\Ah\\^. Furthermore, if j € Vi\ 
Cy(H, i), then 

\C(H,j)\ < |C(^, j)|. 
With a := %\\Ah\\(C 2 + ^) we have 

P[|G| > a] sC F[\C(H, I)\ > C] + P[|G| > a, |G(tf, I)\ ^ C] 

and, noticing that the last term is zero, this is 

= P[\C(H,1)\>C\. 
Furthermore, with (3 := 2C 4 \\ Ah\\/a, 

n\D\ > 0\ 

^P[\C(H,I)\ >C] + P[\D\ >p,\C(H,I)\ <C] 

^P[\C(H,1)\>C] 
+ P[\C(H,I)\ ^ C,3j G C V (H,I) s.t. \C(H' T ,j)\ > C] 
+ P[\D\ > p,\C(H,I)\ ^ C, \C{H'j,j)\ ^ C for all j G C V (H,I)] 

and, noticing that the last term is zero, this is 

^ P[\C(H, 1)| > G] + CP[\C(H, 1)| > G] 
= (C + 1)P[\C(H,I)\>C] 

Finally, with /?' = 2G 5 || Ah\\/a, 
P[\D'\ > ft] 

^P[\C(H,I)\ >C] + P[\D'\ >P',\C(H,I)\ ^C] 

^P[\C(H,1)\ >G] 
+ P[\C(H,I)\ ^ C,3j G C V (H,I) s.t. \C{H'j,j)\ > C] 
+ P[\D'\ > P',\C(H,I)\^C, \C(H'j,j)\^Cyj eC v (H,I)} 

^ p[\c(h, i)| > c] + gp[|g(#, i)| > c] 

+ P[|C(iJ,7)KC, |C(fl-;,j)| ^CVjeC v (H,I), 

3jeV lS .t. \C(H'j',j)\>C] 
+ P[\D'\>p',\C(H,I)\^C, |C(^,i)| ^CVjeC v (H,I), 

\C(H'J,j)\^CVjeVi] 

and, noticing that the last term is zero, this is 

^ P[\C(H, 1)| > C] + C¥[\C{H, 1)| > G] + C 2 P[\C(H, 1)| > G] 
^ (G + l) 2 P[|G(tf,i)| >G]. 

We also have the rough bounds 

2r? 2 
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Hence, choosing C = 81n(n||/t||)/a — 1 we have that C ^ 1 from (|4.20p . 
and using (|4.2ip we have 

p—aC p a 

P[| G | > a] ^ — — < 



A ^ Ara 4 ||/i|| 



2 ' 



p[iDi>/3]^(c + i)^^ 4e ;^t, l2 



A aAra 4 ||/j||< 

pniyi > d'\ <(c + i? e — < 16eain ( n ^") 2 

P[\D\>p\^{G + l) A < a 2 An 4| N |2 • 

From Theorem 12.51 we hence have 
d K (^W,N(0,l)) 
< 12(a/3 + l)/3' + 8a/3 2 

4n 4,| fc ,|2 



+ V^(P[|G| > a] + P[\D\ >0] + P[\D'\ > p']) 

^ 12 / njjAhjj / c2 + _2_\ 2C 4 ||Afo|| + \ 2C 5 ||Afr|| 
s ^ (7 \ Aa 2 / o" J a 

nllA/ill / 2 2 \ 4C 8 ||A/i|| 2 
+ 8 ^( C + A^J^^ 

4n 4 ||/i|| 2 / e 2 u 4e a ln(n||/i||) 16e a ln(n 
+ ~~ ^ \\n*\\h\\ 2 + aAn 4 ||/i|| 2 + o 2 An 4 || 



Aa 2 J n ) cr 3 

2 2 \ C s n\\Ahf 84e"ln(n||fr||) 2 
TL|( + A^J ^ + a^Aa 2 



22^2 | 4 | | 84e"ln(n|H|) 

\ a 2 A n J cr 3 Aa 2 <r 2 



□ 



Remark 4.2. Note that (H,H'j) does not form an exchangeable pair, al- 
though the marginal distributions are the same. To see this, denote by G c 
the complete graph and by Go the empty graph on the vertices [n], where 
we assume n > 2. Now, given H = G c , H[ is just an independent realisation 
of the Erdos-Renyi random graph, hence 

V[H[ = G \H = G c ] = ¥[H[ = G ] = (1 -p)G). 

On the other hand, 

m - G oWi _ GJ . PW = CjH=^P[ g = GD ] _ 

since it is not possible that H- is a complete graph if H is empty. 

5. Zero bias transformation 



Ass ume that W,W = and Var W = 1. It was proved in I Goldstein and Reinert 
:hat there exists a unique distribution Ji?(W z ) such that, for all 
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smooth / we have 

E{Wf(W)} =Bf'(W z ) (5.1) {72} 

Furthermore, J£(W Z ) has a density p with respect to the Lebesgue measure. 
Let us first discuss the connection between our general framework and p. 

Lemma 5.1. Let (W,W',G) be a Stein coupling. Then, with K(t) as 
in (l2Hj) . 



E{G(I[PF < u < W') -l[W u < W\)} = EK{u- W) = p{u) (5.2) {73} 
for Lebesgue almost all aGE. 

Proof. The first equality is clear. To prove the second one, let / be a 
bounded Lipschitz-continuous function. We have 

Gf(W')-Gf(W)= [ f'(W + t)K(t)dt= [ f(u)K(u-W)du, 

so that, from f)l .8j) and using Fubini's Theorem, 

TS{Wf(W)} = [ f'(u)BK(u - W)du. (5.3) {74} 

JR 

As we may take f(x) = 5~ l Jq l[x ^ a] ^ 1 and thus f'(x) = 5~ l I[a ^ x ^ 
a + 5] for any b£K and 5 > 0, we have from (|5.ip and (|5.3p that 

/•a+S ra+8 

I EK(u-W)du= / p(u)du 

J a J a 

which proves the claim as a and 5 are arbitrary. □ 

In iGoldstein and Reinertl ( 19971 ) and iGoldstein and Reinertl ( 2005b! ) a 
method was introduced to construct Jif(W z ) using an underlying exchange- 
able pair satisfying the linearity condition (|3.2p with R = 0. Although the 
construction itself does not directly lead to a coupling of W z with W (which 
is what we ultimately wa nt), it can nevert heles s suggest way s to find such 
couplings; see for example IGoldstein! (j2005h and lGhoshl (|2009I l 

We can generalize the idea to our setting. However, as some of the in- 
volved measures may become signed measures, we have to proceed with more 
care. Denote by F the probability measure on R 2 induced by (W, W'). With 

<p(w, w') := E,(GD | W = w, W = w') = (w' - w)1E{G | W = w, W = w') 

define a new (possibly signed) measure 

dF(w,w') = ip(w,w')dF(w,w'). 

We have / r2 dF(w,w') = 1 because E(GD) = 1 from (|1.8p . Let now the 
space O := R 2 x [0, 1] be equipped with the standard Borel cr-algebra and 
define the measure Q = F <g> i where t is the Lebesgue measure on [0, 1]. 
Define the mapping W z : Cl — > R, as 

W z (w, w' , u) := uw' + (1 — u)w 



Clearly, W z is measurable. We now have the following. 
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Lemma 5.2. Assume that (|1.8p holds. Then, the measure P z on R, induced 
by the mapping W z is a probability measure and for every function f : R — > 
K, with bounded derivative we have 

lE{Wf(W)}=[f'(W z (w,w',u))dQ(w,w',u)=[f'(x)p(x)dx. (5.4) {75} 
Proof. Let us proof the first equality of ([57 



f(W z (w,w',u))dQ(w,w' ,u) 

f'(w + — w))du dF(w, w') 

■dF(w,w') 



R 2 J [0,1] 

/K) - f(w) 



R2 W - W 

ip(w,w ) ; dFlw,w ) 

R2 w'-w 

= E{G(/«) - f(W))} = nWf(W)}. 
We thus have proved that the measure P z , induced by W z and Q, satisfies 

!E{Wf(W)} = [ f'(x)dP z (x). (5.5) {76} 

Using the special functions from the proof of Lemma 15.11 it is clear from 



(|5,5p and (|5.ip that p is the Radon-Nykodim derivative of P z with respect 
to the Lebesgue measure. This implies the second equality in (|5.4p . □ 



In the case where <p(W, W) ^ almost surely, F is also a probability 
measure and we can write 

w z = UW' + (1 - U)W 

where (W, W) has distribution F and U ~ U[0, 1] is independent of (W, W')- 

6. Proofs of main results 

6.1. Preliminaries. For a random variable X define the truncated version 
X := (X A 1) V (-1). For a real number i define t+ = t V and t_ = f A 0. 

Stein's method for normal approximation is based on the differential equa- 
tion 

f'(w) - wf(w) = h{w) - lEh(Z) (6.1) {77} 

where Z ~ N(0, 1) which can be solved for any measurable function h for 
which E/i(Z) exists. T he solution K is differentiable and, if h is Lipschitz, 
also f' h is Lipschitz; see ISteinl (j 19861 ). 

Lemma 6.1. Let W , W , W" , D and G be square integrable random vari- 
ables. Let h be a measurable function for which lEh(W) and lEh(Z) exist 
and let f be the solution to (|6,ip . Then 

\Bh(W)-m(Z)\ < (ll/H V||/ , ||)ro + ||/ , ||(r 1 +r 2 +r 3 ) + |E J R 1 (/)| + |E J R 2 (/)|, 
where 

R 1 (f) = (GD-S)(f'(W")-f'(W)) (6.2) {78} 
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R 2 (f) = G f D (f'(W + t)-f'(W)). (6.3) {79} 

j o 

Proof. Let f = fh.be the solution to (|6.ip . We can assume that ||/|| and ||/'|| 
are finite otherwise the statement is trivial. Prom the fundamental theorem 
of calculus, we have 

f(W')-f(W)= / f'(W + t)dt. (6.4) {80} 

J o 

Multiplying (|6.4[) by G and comparing it with the left hand side of (|6.ip we 
have 

- Eh{W) = f'(W) - Wf(W) 

= Gf{W) - Gf(W) - Wf(W) 
+ (S- GD)f'{W") 
+ (1 - S)f\W) 
+ G{D-D)f'(W) 
+ (S-GD)U'(W)-f>(W")) 

-G [ (f'(W + t)-f'(W))dt. 
Jo 

Taking expectation, the lemma is immediate. □ 

6.2. Bound on the Wasserstein distance. If h is Lipschitz continuous, 
then the solution / to (|6.ip is differentiable, /' is Lipschitz and we have the 
following bounds: 



(6.5) {82} 



^2\\h% Wf'W^^WW, \\f"\\^2\\h'\ 
fsee lSteinl (|l972l N | andlRai^ <|2004h )- 



(6.6) {83} 



Proof of Theorem \2.1\ The following bounds are easy to obtain using Tay- 
lor's theorem: 

\BR 1 (f)\^2r' 4 \\f'\\+r' 5 \\f"\\, |Ei? 2 (/)K2r 4 ||/ / ||+0.5r 5 ||r||. 

Combining this with Lemma 16.11 and the bounds (|6.6p for Lipschitz h with 
\\h'\\ ^ 1 proves the theorem. □ 

6.3. Bounds on the Kolmogorov distance. If, for some e > 0, h is of 
the form 

' 1, if x ^ a, 



K,e{ X ) 



3.7) {84} 



l + (a — x)/e, if a ^ x ^ a + e, 

0, if x > a + e, 

then, from pages 23 and 24 of lSteinl (|l986r i (see also lChen and Shad <|2004h > 
we have for / aj£ for every w, v E R the bounds 

< / 0>e H ^ V^F/4, |/:»K1, l/a, e H-/a»l^l (6-8) {85} 
and, in addition for every s, t E R, 
\f> ( w + s )-f> ( w + t)\ 
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rsVt 

< (\w\ + l)min(|s| + \t\,l) + / I [a < w + u < a + (6.9) 

./sAi 



{86} 



^ (H + l)rain(|s| + |i|,l) + I[a- s Vi < w < a- s At + e]. (6.10) {87} 

Throughout this section let n := <Zk(«£?(W), N(0, 1)) . Now, it is not 
difficult to see that for any e > 0, 

« < sup|E/i a , £ (W) - E/i a , £ (Z)| + 0.4e, (6.11) {90} 

so that we can use Lemma 16.11 for functions of the form (|6.7p . If f a E is the 
solution to (|6.ip . from the bounds (|6.8p we thus have 

«<ro + ri+r2 + r3 + 0.4£ + sup|E/2 1 (/ 0ie )| + sup|E/2 2 (/ , e )|, (6.12) {91} 

aeR aeR 

which will be the basis for further bounds on k. 

Recall the well known relation between the arithmetic and geometric 
mean, that is, for each x, y > and 9 > we have 

xy< eX+ r ly , (6-13) {92} 



which will be used several times. We will also make use of the fo l lowin g 
simple lemma in order to implement the recursive approach; see lRaic (2003). 

Lemma 6.2. For any random variable V and for any a < b we have 

P[a < V < b] < -JL^ +2d K (3?{V),N(0,l)). (6.14) {93} 

V 2vr 



Proof of Theorem \2. 51 Let / = f ae be the solution to (|6.1|) . where we will 
from now on omit the dependency on a and e for better readability. Let 
h = I[\G\ < a, \D\ < /3, < /3', |5| < 7] and write as 

Efli(/) = E{(OD - 5)(1 - h)(f'{W") - f{W))} 

+ E{(OD - S)h(f'(W") - f'(W))} =: Jx + J 2 . 

Using (|6.8p . the bound |Jx| r 6 is immediate. Let for convenience k : = 
E|W| + 1. Using dSHJ) and Lemma IO 



J 2 < E|(GZ) - 5)/i(/'(W") - f'(W))\ 

rP' 

< (a/3 + 7 )/c/3' + (a/3 + j)e^ / P[a ^ W + u ^ a + e]du 

< (a/3 + 7)A;/3' + 0.8(a/3 + 7)/?' + 4(a/3 + 7)/3 / £~ 1 k. 
Similarly, let J 2 = I[|G| < a, |D| < /3] and write flO|) as 

Ei? 2 (/) = e|g(1 - J 2 ) j\f\W + t) - f'(W))dt\ 

+ e|g/ 2 J D (f'(W + t)- f'(W))dt\ =: J 3 + J 4 . 
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By (|EBJ, | J 3 K r 6 . Using fl6Jl and Lemma MM 
J A < E 

r/3 /-/3 /•*+ 



GI 2 ! + \f\W + t)- f'(W)\dt 

rP rP rt+ 

a / \t\kdt + ae^ I I F[a ^ W + u ^ a + e]dudt 

J-B J -0 Jt- 



1-/3 Jt 

< a(3 2 k + 0.4a/3 2 + 2a/3 2 e" 1 K, 

so that, collecting all the bounds, setting e = 4a/3 2 + 8(a/3+7)/3' and making 
use of (|6.12p . we obtain 

K^r + r 1 +r 2 + r 3 + r fi + r' 6 + (a/3 + j)(k + 0.8)/?' + (A; + 0.4)a/3 2 

+ 0.4e + (2a/3 2 + 4(a/3 + 7 )/3')^ 1 k 

= r + ri + r 2 + r 3 + r 6 + r' 6 + (aft + j)(k + 4)/3' + (k + 2)a/3 2 + 0.5k 

which, solving for k, proves the theorem. □ 

Proof of Theorem \2.8[ Let / = f a>£ be the solution to (|6.ip . As we assume 
W" = W, we have R x (f) = 0. Write §63$) as 

/oo 
(f\W + t)-f'{W))K w {t)dt 
-oo 

= e[ {f'(W + t)-f'(W))K w (t)dt 
J\t\>i 

+ E / (f'(W + t)- f'(W)) (K w (t) - K(t))dt 
J\t\*il 

+ e/ (f'(W + t)-f'(W))K(t)dt 

= : Ji + J 2 + J 3 . 
Clearly, by jESJ), | Ji| < r 4 . Now let us bound J 2 . By fTTUl) . 

|J 2 KE / (1^1 + 1)1*11^(^-^)1^ 

J|t|<i 

+ E / I[a - t ^ W < a + e]|-fir w (t) - A"(t)|tft 
Jo 

r° 

+ Ey I[a < W ^ a - i + e]|K W (t) - K(t)\dt 

=: J 2i i + J 2;2 + J 2i 3, 
Using (|6.13|) . we have for any 9 > that 



J 2 ,i ^ ^ E 



(|W| + l)2| t |dt + -L E / |t||^(t)-K(t)| 2 dt 



= ^E(|W| + l) 2 + ^rf . 
Let q = (E(|W| + 1) 2 ) 1/2 and choose 9 = V 8 , so that 

^2,1 ^ a r 8 . 
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Let now 5 = OAs + 2k. Then, from Lemma |6.2[ we have for t > 

P[a-t < W ^ a + e] ^ 5 + OAt. (6.15) 

Using (ISH, 

^2,2 Ely 0.5e(5 + 0.4t) -1 I[a - i < W < a + e] 

+ 0.56- 1 (S + OAt) (K w (t) - K{t)) 2 j dtl 

< 0.5e + O.Se- 1 ^ / \sx(K w {t))dt + Q.2e- 1 I t\w{K w {t))dt. 
Jo Jo 

A similar bound holds for J 2)3 so that 

I J 2 | < «or 8 + e + 0.5£ -1 5r 7 + 0.2£ _1 r|. 

By M, 



|j 3 |<e/ (|w| + i)|tiir(t)|dt 
J|ti^i 



+ f [ F[a^W + u^a + £}du\K(t)\dt 
J\t\^i Jo 

^ (E|W| + l)r 5 + e" 1 / 5|iK(i)|cft ^ (E|W| + l)r 5 + e" 1 



Choose now 

-31 V2 



e = VTA[ K {2r 5 + r 7 ) + 0.2r| 

From ()6.12p (note that n = because D = D), using that \fx + y ^ 
\fx + -y/y and ()6.13p , we obtain 

K^r + r 2 + r 3 + OAs + | J x \ + | J 2 | + | J 3 | 

< r + r 2 + r 3 + r 4 + (E|W| + l)r 6 + a r 8 + 1.4fi 
+ ^ 1 (5(r 5 + 0.5r 7 ) + 0.2r|) 

< r + r 2 + r 3 + r 4 + (E|W| + 1.4)r 5 + 0.2r 7 + a r 8 + 1.4e 
+ e- 1 ( K (2r 5 + r 7 ) +0.2r|) 

< r + r 2 + r 3 + r 4 + (1B|W| + 1.4)r 5 + 0.2r 7 + a ^8 

+ 2.4( K (2r 5 +r 7 )+0.2r|) 1/2 

< r + r 2 + r 3 + r 4 + (E|W| + 1.4)r 5 + 0.2r 7 + (a + l.l)r 8 

+ 2.4( K (2r 5 +r 7 )) 1/2 
^r + r 2 +rz + r A + (E|W| + 1.4 + 2.4fl- 1 )r 5 + (0.2 + 1.2fl _1 )rr 
+ (ao + l.l)r 8 + 1.29k 

Choosing 9 = 1/2 A and solving for k proves the claim. □ 



STEIN COUPLINGS FOR NORMAL APPROXIMATION 50 

Proof of Theorem \2.9[ Let / = f a , £ be the solution to (|6.ip . Now, from (|6.9|) . 
|J2i(/)| < \(S-GD)(\W\ + l)D'\ 

+ \S- GDIs- 1 / I a < W + u < a + e]du 

so that 

E|fli(/)| < E|(5 - GD)(\W\ + l)D'\ 

+ e- 1 'E\(S -GD)D'S £ (^(W\G,D,D'))\. 

Furthermore, 

\R 2 (f)\^\G\ [ D+ (\W\ + l)(\t\M)dt 
Jd- 

+ e" 1 |G| f + [ + l[a^W + u^a + e]dudt 
Jd- Jt- 

hence 

E|i? 2 (/)| < 0.5E|G(|W| + 1)D 2 \ +0.5e- 1 'E\GD 2 S £ (^(W\G,D))\ 
Combining these bounds with (|6.12p proves the theorem. □ 

To prove Lemma 12.101 we first need a simple lemma. 

Lemma 6.3. Let (3k,i, 1 ^ I < k, k = 2, . . . ,n, be non-negative real numbers. 
If ' a\ ^ b% = 1 and if there are constants q ^ 1 and p < 1 such that, for all 
k = 2, 3, . . . , n, we have Ya=i Pk,l < P, 

k-i 

ak = q + ^h,m and b k = q + pb k _ 1 , 
i=i 

then afc ^ 6fc ^ qj (1 — p) for all 1 ^ k ^ n. 
Proof. Note first that q/(l — p) ^ 1 and 

hence 61,62,... is increasing with upper bound q/(l — p). The proof of 
flfc 6fc is now a simple induction on fc. By assumption a\ ^ 61, which 
verifies the base case. Using that ai ^ bi for all 1 ^ Z ^ k, we have 

fc k 

1=1 1=1 

k 

^q + ^2Pk+i,ibk^q + pbk = b k+ i- □ 
1=1 

Proof of Lemma \2.1(K First note that (|2 . 8[) still holds if we replace A by 
A := A\/ 1. Let <ti := 1 and define a new sequence a& = for 1 ^ k ^ re. 
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Using inequality (|2.8j) with e = c n a n /ak for a constant c n > 1 to be chosen 
later, 

fc-l 

a k ^ A + 0Ac n a n + ^ /3 fc ,za; 

!=1 

for all 1 ^ A; ^ n, where 

Pk,l = • 

CnOlOtn 

Note that Yli=i Pk,l ^ c n X ■ Consider now the solution o& to the recursive 
equation 

&i = 1; bk = A + 0Ac n a n + c~ 6fe_i, for 2 ^ ra. 

Note that ^4 + 0Ac n a n ^ 1, hence we obtain from Lemma 16, 31 that 

c n (^ + 0.4c n a„) 
a n ^ On ^ ; • 

Cn - 1 

Minimizing over c n > 1 we can chose 



1 V2a n (2a n + 5A)) = ( + 



2a„ 2a n 
Recalling that K n = a n /a n , the claim follows. □ 
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